Sunday, March 2, 2025

I am globally unique, baby

UUIDs are supported everywhere these days. But what about binary compatibility?


Although UUIDs were not invented by Microsoft, they are quite prominent on Microsoft platforms - in COM and elsewhere. Win32 API has a datatype for a UUID:

struct _GUID
{
    unsigned long  Data1;
    unsigned short Data2;
    unsigned short Data3;
    unsigned char  Data4[8];
};

And so does Java (methods omitted):

public final class UUID{
    private final long mostSigBits;
    private final long leastSigBits;
}

Java has logic for serializing/parsing UUIDs into/from byte arrays. But the catch is, if you take an UUID string, convert it to an UUID object and then to bytes, the bit pattern of it won't match the contents of a Windows UUID structure for the same string. More specifically, the leading 8 bytes won't match - they will be shuffled.

Let's unpack (literally). A hexadecimal UUID string goes:

xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx

Where the first section of 8 hex digits corresponds to Data1, the next two 4 digit ones correspond to Data2 and Data3, and the final two sections correspond to Data4 - the dash between them is extraneous. Since Windows is pervasively little endian, Data1/2/3 are stored in little endian order, and that's where the mismatch with Java is.

In the Java UUID structure, the upper 32 bytes of mostSigBits correspond to Data1, bits 16 to 31 are Data2, bits 0 to 15 are Data3. The value of leastSigBits corresponds to Data4, stored in  the big endian form.

When storing the two longs in bytes, Java stores them both as big endian. For leastSigBits, it matches the Windows expectations. For mostSigBits, it reverts the byte order of the 32-bit int and of the two 16-bit shorts, compared to what Windows expects.

Just in case someone out there needs to interoperate, I've put together a Java gist than implements UUID serialization and deserialization in the way that would be binary compatible with Windows. Serialization can be in place into an existing byte array, or into a freshly created one. Deserialization can use the whole array, or work with an array slice at an offset. Array underrun is handled by Java native means - you will get an ArrayIndexOutOfBoundsException if the array is too short.

No comments:

Post a Comment