Monday, April 28, 2014

Hash of a hash of a hash

Problem: there are two MySQL databases on two different websites that should be mostly identical, except for maybe a few tables. I would like to figure out quickly which tables don't match so that I can run a manual sync.

Solution: hash every record, then hash the records together. Then compare per-table hashes by hand.

In PHP, here's how it goes:

<table>
<?php
function WrapField($f)
{
    return "MD5(IFNULL(`$f`,''))";
}


//Retrieve the list of table names
$TableList = DB_GetScalarArray('show tables');


DB_ExecuteSQL('SET SESSION group_concat_max_len = 10000000;');

//10 MB; you might need more


foreach($
TableList as $Table)
{
    if(DB_GetScalar("select count(*) from `$Table`") > 0)

    //Skip empty tables
    {

        //Retrieve the field names
        $OneRecord = DB_GetRecord("select * from `$Table` limit 1");
        $Fields = array_keys($
OneRecord);


        //Compose the expression for all field hashes
        $SelectSet = implode(',', array_map("WrapField", $Fields));



        //Now the real work: hash of record hashes
        $SQL =

            "select MD5(GROUP_CONCAT(".
                "MD5(CONCAT($SelectSet)) ".
            "ORDER BY `".$Fields[0]."`)) FROM `$Table`";
        $Hash = DB_GetScalar($SQL);
        echo '<tr><th>'.$Table.'</th><td>'.$Hash.'</td></tr>';
    }
}
?>
</table>


The database helper functions DB_GetScalar(), DB_GetRecord(), DB_GetScalarArray() are thin wrappers on top of the MySQL client library (Mysqli in my case). What do they do should be clear enough. DB_GetRecord() returns an associative array with field names as keys.

The triple hashing was a necessity, I'm afraid. First, simply concatenating the field values will error out if any two the fields in a table happen to have different collations. One can probably work around that by casting to binary, but hashing each individual field works as well.

Hashing of the entire record was done to make the source set of the group_concat() smaller. If the table is wide, the concat of all field hashes would be pretty wide, too. So the total length of the group_concat argument would be MD5Size*NumRecords*NumFields. If the table is long, too, there'd be a real risk of running into the limitation of group_concat() source length. Producing a per-record hash makes the length MD5Size*NumRecords instead.

One assumption that this code makes is that the first field in the table is the primary key, or at least a somewhat identifying field. If it's not, there might be false negatives caused by mismatched record ordering. A better script would analyse MySQL metadata to retrieve the proper primary key; but then there'd be need for a fallback for keyless tables anyway.

Monday, April 14, 2014

Reverse proxy needed

HTTP reverse proxies are a Useful Thing. Sometimes it's for exposing an endpoint from behind a firewall that you don't control, sometimes it's for moving a service from one public URL to another while not leaving legacy consumers in the dark. Both Apache and IIS support them almost out of the box, but how often do you have control over server-level settings with a cheap-o hosting plan? For discussion's sake, let's assume you don't; otherwise, there'd be no discussion.

Enter PHP, libcurl and the untameable spirit of Doing Things on the Cheap (a recurring theme here). A half-assed PHP reverse proxy seems easy to write. Collect target URL and POST data (if any), fire a CURL request to the intended target, pass the response along. And indeed, a web search quickly reveals a few quick and dirty implementations. Here's one. Here's another. The need is out there, a quick and dirty implementation is entirely possible, so here they go.

And yet it seems like I'm about to put together yet another one. Here are my requirements that those Q&D proxies don't meet:
  • POST data in arbitrary format (not just forms)
  • PHP session support (general purpose cookies not necessary)
  • Methods other than GET/POST
Still quite doable. Session support, however, might require some special care. Session cookie on the proxy machine might get in the way of the target session cookie. Stay tuned.

Update: I went ahead and wrote one.

Tuesday, April 1, 2014

Splitting GIF into frames on Android via giflib

PREAMBLE: the app where this technique originated is no longer using it. I've integrated GifFileDecoder by Google and never looked back. Had to patch it somewhat, though - my GIFs are small, makes more sense to read them into memory rather than display progressively.

UPDATE: it's now a Gist.

This is a followup to my answer at StackOverflow regarding animated GIFs on Android. Folks want code - I've got some. The general idea is - use giflib to get RGB(A) pixel data for each frame in a format that's compatible with Android's, feed the pixels to bitmaps, display the bitmaps.

Naturally, a starting point is a Java Android project with a native library in it. The first step is including several files from giflib 4.1.4. They're sitting in an archive, attached to a comment on a Gist. Also in that Gist there is a native wrapper called MyClass.cpp. Unzip the archive into your native library's folder, and list the following sources in the Android.mk:
  • dgif_lib.c
  • gif_err.c
  • gifalloc.c
  • MyClass.cpp
Also, insert the following line into Android.mk: 


LOCAL_CFLAGS := -D HAVE_SYS_STAT_H -D HAVE_SYS_TYPES_H -D HAVE_FCNTL_H -D HAVE_INTTYPES_H -D HAVE_UNISTD_H -D HAVE_STDLIB_H -D HAVE_STDINT_H -D UINT32=uint32_t 


Without that, giflib won't compile. Finally, rename the Java_..._LoadGIF function in MyClass.cpp to match your package and class. The function LoadGIF() logically belongs to class MyClass; we'll discuss its Java side later. LoadGIF takes two arguments - a local file name for the GIF file (not a URL!), and a boolean called HighColor that specifies the generated pixel format. With HighColor=false, it generates ARGB_4444 bitmaps; with true, it's ARGB_8888.

ARGB_4444 used to work for me for a while, but then it was phased out in recent versions of Android, so go with high color, unless you're targeting old, low memory devices.

Now to the Java world. There are two classes there - MovieView and the second one, the one I've called MyClass for genericity's sake. Feel free to rename.

MovieView is in the same Gist.It's fairly simple; it just displays bitmaps driven by a timer. It's a view, so it can be placed into a layout file.

MyClass is where the action takes place. In my project, it's a subclass of Dialog and it's quite involved. The key part is that it holds a reference to a MovieView instance and feeds a GIF to it. I'll paste just the relevant parts here:

class MyClass
{
    private MovieView m_mv; //initialized on loading

    private static s_bHighColor = true;
    //Format; hard-coded here
    
    private native boolean LoadGIF(String FileName, boolean bHighColor); 

    //loadLibrary() is called elsewhere
    //Called from JNI - do not mess with it.
    private void AddFrame(int Delay, int l, int t, int w, int h, int Disp, byte [] Bits)
    {
        Bitmap bm = Bitmap.createBitmap(w, h,

        s_bHighColor ?
            Bitmap.Config.ARGB_8888 :
            Bitmap.Config.ARGB_4444);
        bm.copyPixelsFromBuffer(ByteBuffer.wrap(Bits));
        m_mv.AddFrame(bm, l, t, Delay, Disp);
    }

    //Called from JNI - do not mess with it.
    private void StartImage(int FrameCount)
    {
        m_mv.Reset(FrameCount);
    }

//////////////////////////////// The animation starts here
    public void StartMovie(File f)
    {
        if(LoadGIF(f.getAbsolutePath()
, s_bHighColor))
        //This will call Reset(), AddFrames()
            m_mv.Start();
    }




The flow is: you call MyClass.StartMovie(). StartMovie() calls LoadGIF() which calls StartImage() once and AddFrame() in a loop. If everything goes fine, then MovieView.Start() is invoked.

To terminate the animation, call MovieView.Stop(). This code assumes infinite loop, but feel free to follow the Disposition parameter from the GIF.

The official home of giflib 4.1.4 is here at SourceForge.