UPDATE: it's now a Gist.
UPDATE: now in Pascal too.
UPDATE: now for MySQL too.
Sometimes, one needs an intermediate strength hash. Something less collision prone than CRC32 or Adler-32, but smaller and less costly than MDx or SHAxxx. My favorite for a while has been MurmurHash64 - it produces a 64-bit value, just the right size to fit into a scalar variable in a C program, or a bigint database field.
There are several flavors of MurmurHash; I'm partial to the one called MurmurHash64B. MSSQL is capable of storing 64-bit variables, but it doesn't support unsigned integer types, and treats arithmetic overflow as an exception. So the B flavor, which was originally geared towards 32-bit machines, is a better fit than A. One can use signed bigint to emulate the arithmetics of unsigned int.
The official home of MurmurHash is at Github. Over there, one can find an implementation in C.
This code produces a signed 64-bit value - half the hash values are negative. The C implementation produces an unsigned one, always positive or zero. Keep that in mind when building your interoperation logic.
I'll just leave it here. An implementation of MurmurHash64B in Microsoft SQL Server's Transact-SQL. Tested with MSSQL 2008 and 2014.
create function [dbo].[MurmurHash64B]
(
@s varbinary(max),
@seed bigint = 0
)
returns bigint
as
begin
declare @len int = datalength(@s), @i int = 1
declare @m bigint = 0x5bd1e995
declare @h1 bigint = @len ^ (@seed & 0xffffffff)
declare @h2 bigint = 0
declare @k1 bigint, @k2 bigint
while @len >= 8
begin
set @k1 = cast(convert(binary(4), reverse(substring(@s, @i, 4))) as int)
set @k1 = (@k1 * @m) & 0xffffffff
set @k1 ^= @k1 / 0x1000000
set @k1 = (@k1 * @m) & 0xffffffff
set @h1 = (@h1 * @m) & 0xffffffff
set @h1 ^= @k1;
set @k2 = cast(convert(binary(4), reverse(substring(@s, @i+4, 4))) as int)
set @k2 = (@k2 * @m) & 0xffffffff
set @k2 ^= @k2 / 0x1000000
set @k2 = (@k2 * @m) & 0xffffffff
set @h2 = (@h2 * @m) & 0xffffffff
set @h2 ^= @k2
set @len -= 8
set @i += 8
end
if @len >= 4
begin
set @k1 = cast(convert(binary(4), reverse(substring(@s, @i, 4))) as int)
set @k1 = (@k1 * @m) & 0xffffffff
set @k1 ^= @k1 / 0x1000000
set @k1 = (@k1 * @m) & 0xffffffff
set @h1 = (@h1 * @m) & 0xffffffff
set @h1 ^= @k1
set @len -= 4
set @i += 4
end
if @len >= 1
begin
if @len >= 2
begin
if @len >= 3
set @h2 ^= ascii(substring(@s, @i+2, 1)) * 0x10000
set @h2 ^= ascii(substring(@s, @i+1, 1)) * 0x100
end
set @h2 ^= ascii(substring(@s, @i, 1))
set @h2 = (@h2 * @m) & 0xffffffff
end
set @h1 = ((@h1 ^ (@h2 / 0x40000)) * @m) & 0xffffffff
set @h2 = ((@h2 ^ (@h1 / 0x400000)) * @m) & 0xffffffff
set @h1 = ((@h1 ^ (@h2 / 0x20000)) * @m) & 0xffffffff
set @h2 = ((@h2 ^ (@h1 / 0x80000)) * @m) & 0xffffffff
declare @h bigint
if (@h1 & 0x80000000) <> 0
begin
set @h1 &= 0x7fffffff
set @h = (@h1 * 0x100000000) | 0x8000000000000000
end
else
set @h = @h1 * 0x100000000
set @h |= @h2
return @h
end
Thank you for the above code. But can you provide the C# implementation which yield the same result in SQL as well as C#.
ReplyDeleteThank you in advance.
I believe there's a NuGet package for that.
ReplyDeleteAnyone found MurmurHash3 in tsql?
ReplyDeleteCan you try and adapt what I have? The MurmurHash3_x86_32 flavor should be amenable to the same techniques. Uint32 is represented by bigint, whatever may cause overflow of the lower 32 bits needs to be masked with 0xFFFFFFFF.
ReplyDeleteRotation will look clumsy, but oh well.