Thursday, December 22, 2016

Encoding? Compression? You tell me...

In HTTP, there are two ways to specify compression of the content:

  • Content-Encoding:gzip
  • Transfer-Encoding:gzip
Amazing discovery of the day: command-line cURL, as of v.7.21.6, recognizes both, while the cURL functions in PHP, as of PHP 5.3.3, only respect the former.

Tuesday, December 20, 2016

Strikes again

Just a few days after I wrote about the in-app purchase simulator, someone tried it again. Not that they've succeeded; the server-side order signature check caught it. But the device-side check didn't. I'm really wondering how'd they do that.

Here's the first step: I'm going to try reimplementing the signature check by hand, and see if the fake order passes that. I suspect the IAP simulator taps into Java's built-in Signature.verify() method, making it return a hard-coded true. It wouldn't know about my homegrown implementation, obviously.

The signature algorithm that Google Play uses is well known, it's SHA1 with RSA. Normally, I'd be the first one to recommend against building your own crypto primitives, but in this case, it's probably justified, at least as the first step to counteract the fraud. I can probably still use the built-in SHA1, just need to reimplement the RSA portion. The latter is a bunch of BigInteger arithmetic.




Friday, December 2, 2016

Google Play fraud

I have a freemium Android app on the Google Play store. It's free to download, but there's a paid subset of functionality. To unlock it, one may download a separate app ("The License App"), or pay with an in-app purchase (IAP).

Such a scheme is open to many avenues of abuse. I'll try to outline some of those, and discuss the ways to combat them.

Owning without buying

The other day, a customer told me: "Here's how it's normally done: you have a license app, once they install that, the premium content is unlocked". This sounds fine, until someone grabs an APK of the license app from a rooted Android device and publishes it for the whole world to see. I have some good evidence that my LicenseApp is available outside Google Play, although I never disclosed it elsewhere in any way.

Maybe in recent versions of Android grabbing a paid APK is not as straightforward as it once was. When I was last investigating this possibility, around Android 2.2, it was trivial - once the app was installed, the APK would just sit there in the filesystem. With some hackery, one could even install Google Play on the Android emulator.

Even if it's more tricky these days, all it takes is one determined person with a rooted phone.

Buying, then returning

Google Play supports a license verification service - an app may ask the Play Market app whether it was indeed purchased there. So a well written LicenseApp would not open up the paid content right away - it should ask Play first.

Enter consumer friendly refund policy. At Google Play, during the first 2 hours a customer may undo a purchase with no help from the vendor whatsoever. So, an obvious, low tech way to get a free license would be - buy the LicenseApp, let it unlock the goodies, then return with a refund. It doesn't require a rooted device.

The only way to combat something like that is checking the license periodically, e. g. once a day, but that's rather user unfriendly. What if the device has no Internet connection and the license check fails - should one lock down the functionality then? I've decided early on that I'm not going down this road; once they buy the license, the "it's bought and paid for" state stays indefinitely.

Spoofing in-app purchases


This kind of fraud is the most technically sophisticated one. There's a certain app out there that can feed valid looking in-app purchase records to apps (I'm not linking to it; do your own homework). It requires a rooted device (naturally), but the IAP record withstands all checks, including the digital signature check. I haven't investigated closely, but they probably leverage some kind of debugging API to short out the system-provided signature verification function.

I have very convincing evidence that people have been trying this against my app. There's a simple call-home functionality on successful in-app purchase; ever once in a while, I'd get a record with an order number that doesn't look like a typical Play order number, and a signature that doesn't pass a check. However, the very fact that I receive those notifications means that the IAP has passed the signature verification on the device! As an engineer, I can't help but admire. Were my app to unlock the paid goodies upon a successful IAP, it'd fall for it.

Again, if this kind of Google Play spoofing is available for the IAP logic, it's equally possible for license verification logic.

EDIT: homegrown signature check defeats that.



All of this brings me to the key insight: don't trust the device. The only way to know if a Google Play order is genuine is by checking against the Play back-end. And that's what I've been doing all along.

When I just started selling Android apps, the payment back-end behind the store (then known as Android Market) was called Google Checkout. It had a very convenient HTTP-based API - one could get a list of orders along with their status, each order record would include customer e-mail address. So once an order went through, I'd e-mail the customer a license code which they'd use to unlock. Life was good.

Then Checkout was phased out in favor of Google Wallet. And the API was gone. One could get by by HTML-scraping the administrative console, but the customer e-mail on the order details screen was gone, too. That was the biggest source of my frustration.

I had to reengineer the purchase UI to ask the user for their e-mail after a successful purchase. Most of the time they do, sometimes they don't. I'm now looking at three valid orders that I don't know what to do with, because I have no address to send the code to.

In the latest iteration of the Wallet back-end (they're no longer calling it "Wallet", either), the only way to send a message to the customer is by cancelling and refunding the order. Some irony.

Ways Google is not helping

To their credit, Google did try to provide some measure of programmatic access to my merchant data. Unfortunately, it's always too little, too late. Functional parity with the legacy Checkout is nowhere to be seen.

I can't e-mail my customers, but they're running my app, aren't they? Can't I leverage that to deliver a message to them? Yes, but how do I know the right user? All I have from the Play/Checkout/Wallet console is an order number. Here's where Google license API could help - but it doesn't. It provides no data items that I could possibly match against the Play back-end.

The only potentially useful data item is UserID (the fifth field in the signedData parameter of verifyLicense). It's a Base64 string, which decodes to a 25 byte byte array. I did some testing on a couple of live Play apps, and here are the conclusions:

  • The first 5 bytes ("the prefix") are always the same - 00D94E1D03
  • The remaining array of 20 bytes has no discernible structure - it's not a string in any reasonable charset, it doesn't look like a structure with binary integers, the bit distribution is close to random. All in all, it looks like either a hash value or a cyphertext; the former is more likely. Incidentally, MD5 hash values are exactly 20 bytes long.
  • The hash doesn't identify neither the app package nor the user - same user, different package and vice versa would yield different UserID values. The variable portion of UserID changes completely between packages and users - no bytes are the same. All the more reason to think it's a hash.
  • The UserID doesn't change if you uninstall and reinstall the app.

Play console provides sales reports, even with programmatic access via Google Storage - but those are not nearly realtime. Also, they don't list some of the very relevant information (e. g. the estimated payout amount in US dollars), and they don't list in-progress and cancelled orders.

Google Play has an API of its own, and it even has a purchase information function, but only for IAP. Paid downloads just aren't there. Also, there's no way to get the list of all transactions - only status for a specific one, and none of the financials are reported.

Ways Google is helping

All we have left is the Google payments center,  formerly known as the Wallet Merchant Center, and before that, as the Checkout console. It gives you a realtime list of orders, an order details screen, allows searching. Its most interesting feature, and a crucial one for my scenario, is the ability to search by e-mail. The customer e-mail is not exposed neither in the UI nor in the backing data, but once you have an e-mail, you can find an order (or several) by it. Without this, my business model would be dead in the water.

The e-mail search allows for partial search - lets you find orders by username without domain, even if the domain is not @gmail.com. It even allows for username fragment search - if you have an order from john.doe@hotmail.com, search for either "john" or "doe" or "john.doe" will find it. It doesn't search for arbitrary substrings, though - searching for "joh" won't work. Searching for users' listed first or last name won't work, either, unless they are a part of the e-mail username.

The payments center has no API of its own. But it's a website, and in that capacity, it's open to good old techniques of HTML scraping.

There are two ways to scrape a website - HTTP only and JavaScript-enabled. In the former approach, you issue HTTP requests to the right URLs using a client library of your choosing (CURL is good), then parse HTML and retrieve the data. In the latter, you run the target site in a full blown headless browser (or even a real one) and drive it by some kind of script.

For a while, scraping the Wallet console was possible with HTTP only, then, in early November 2016, they've redesigned the site to rely heavily on JavaScript and AJAX calls. At that point, I had to switch to a full headless browser scraper. PhantomJS was my tool of choice. I can't say its perfect, but it gets the job done.

Monday, November 7, 2016

There's an app for that

Sharing little code snippets on this blog is fine in its own right, but there's a platform out there specifically for that - GitHub Gist (https://gist.github.com/). I think from now on, rather than paste it here, I'd upload it to Gist and link.

Tuesday, September 13, 2016

JavaScript says "beep"

UPDATE: it's now a Gist.

Today's snippet: generating a simple musical tone purely from JavaScript. I saw some online samples for doing it on a server (e. g. in PHP) and sending to the client, but that's a waste of bandwidth.

The ingredients are twofold:
  • An <audio> element with a source that's populated from a data: URI
  • An ArrayBuffer object with WAV data inside
There's a concise description of the WAV format here. My implementation limits the format to 16-bit mono PCM. The following JavaScript returns a data: URI with a WAV file with a sine wave that corresponds to a single tone:


function GenerateTone(frequency, duration, volume, rate)
{
    if (!volume)
        volume = 30000;
    if (!rate)
        rate = 8000;

    var nSamples = rate * duration,
        i, w = (2 * Math.PI * frequency) / rate,
        wav = new ArrayBuffer(44 + nSamples*2);
    (new Int32Array(wav, 0, 11)).set(
        [0x46464952, 36 + nSamples*2, 0x45564157,
        0x20746d66, 16, 0x10001, rate, rate*2, 0x100002,
        0x61746164, nSamples*2], 0);
    var samples = new Int16Array(wav, 44, nSamples);
    for (i = 0; i < nSamples; i++)
        samples[i] = volume * Math.sin(w * i);
    return "data:audio/wav;base64," + ToBase64(wav);
}

The parameters are:
  • frequency in Hz
  • duration in seconds
  • volume - max is 32768, min is 0
  • rate - in samples per second, default 8000 is enough for simple beeps
That's it. All that remains is feeding that data to an <audio> element. Assuming you have an <audio id="MyAudio"> element on the page:

var audio = document.getElementById("MyAudio");
audio.src = GenerateTone(349.23, 0.5); //The F note
audio.play();

The frequencies of piano notes can be found here. For high pitches you'd want a sample rate higher than 8000.

One missing bit in the code above is the ToBase64() function that takes an ArrayBuffer and returns its Base64 representation. There any many samples for this on the Net, and I expect it to become a part of DOM eventually, but for completeness' sake, here's one implementation I've been using:

function ToBase64(arrayBuffer)
{
    var s = "", a, b, c, d, chunk;
    var alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

    var bytes = new Uint8Array(arrayBuffer)
    var byteLength = bytes.byteLength;
    var byteRemainder = byteLength % 3;
    var mainLength = byteLength - byteRemainder;

    for (var i = 0; i < mainLength; i += 3)
    {
        chunk = (bytes[i] << 16) | (bytes[i + 1] << 8) | bytes[i + 2];
        a = (chunk & 16515072) >> 18;
        b = (chunk & 258048) >> 12;
        c = (chunk & 4032) >> 6;
        d = chunk & 63;
        s += alpha[a] + alpha[b] + alpha[c] + alpha[d];
    }

    if (byteRemainder == 1)
    {
        chunk = bytes[mainLength];
        a = (chunk & 252) >> 2;
        b = (chunk & 3) << 4;
        s += alpha[a] + alpha[b] + "==";
    }
    else if (byteRemainder == 2)
    {
        chunk = (bytes[mainLength] << 8) | bytes[mainLength + 1];
        a = (chunk & 64512) >> 10;
        b = (chunk & 1008) >> 4;
        c = (chunk & 15) << 2;
        s += alpha[a] + alpha[b] + alpha[c] + "=";
    }
  
    return s;
}


This code will not work on big-endian machines. JavaScript's IntArrays use the endianness of the underlying CPU, while the WAV format assumes little-endian integers.

Sunday, May 8, 2016

You can try or you can __try

Some time ago, I've outlined a technique for wrapping a C++ fragment in a Structured Exception Handling (SEH) crash catcher. It involved a rather convoluted sequence of C++ to C back to C++ calls.

I'm glad to report that this is no longer necessary. As of Visual Studio 2015 Update 2, one can freely mix and match try/catch with __try/__except in a C++ source. So the SafeCall template I've once presented simplifies to something like this:

template<typename TFunctor>
void SafeCall(const TFunctor &f)
{
    CONTEXT Ctxt;
    void *Address = 0;
    __try
    {
        f();
    }
    __except (ExcFilter(GetExceptionCode(), GetExceptionInformation(), Ctxt, Address))
    {
        OnCrash(GetExceptionCode(), Address, &Ctxt);
    }
}


Thursday, April 21, 2016

UTF-8 in Transact-SQL: no CLR necessary

UPDATE: it's now a Gist.

While we're on the subject of Transact-SQL utilities, here's one for converting a Unicode string (as NVARCHAR) to a UTF-8 binary data block. It works correctly with non-UCS2 strings encoded as UTF-16. To make sure, try it on a string with an Emoji character, for example, N'😂'. That's codepoint U+1F602, the widely smiling face.

create function [dbo].[ToUTF8](@s nvarchar(max))
returns varbinary(max)
as
begin
    declare @i int = 1, @n int = len(@s), @r varbinary(max) = 0x, @c int, @d varbinary(4)
    while @i <= @n
    begin
        set @c = unicode(substring(@s, @i, 1))
        if (@c & 0xfc00) = 0xd800
        begin
            set @i += 1
            set @c = ((@c & 0x3ff) * 0x400) | 0x10000 | (unicode(substring(@s, @i, 1)) & 0x3ff)
        end

        if @c < 0x80
            set @d = cast(@c as binary(1))
        if @c >= 0x80 and @c < 0x800 
            set @d = cast(((@c * 4) & 0xFF00) | (@c & 0x3f) | 0xc080 as binary(2))
        if @c >= 0x800 and @c < 0x10000
            set @d = cast(((@c * 0x10) & 0xFF0000) | ((@c * 4) & 0xFF00) | (@c & 0x3f) | 0xe08080 as binary(3))
        if @c >= 0x10000
            set @d = cast(((@c * 0x40) & 0xFF000000) | ((@c * 0x10) & 0xFF0000) | ((@c * 4) & 0xFF00) | (@c & 0x3f) | 0xf0808080 as binary(4))
            
        set @r += @d
        set @i += 1
    end
    return @r
end

Saturday, April 16, 2016

Your software WILL crash.

When reporting on unhandled .NET exceptions, don't use the builtin Exception.StackTrace. Instead, construct an instance of System.Diagnostics.StackTrace with the exception object as the parameter, and generate your own trace. The difference is, with the latter, you can get offset to the crash point within the method.

The StackTrace contains an array of StackFrame objects. Each of those has the following methods:

  • GetILOffset() - offset to the crash/call within method's bytecode (MSIL) body
  • GetNativeOffset() - the same in JIT-compiled, native form
  • GetMethod() - returns System.Reflection.MethodBase, which can be used to reconstruct a human readable method prototype.
The IL offset can be resolved to sources using ILDASM.

The native offset can be resolved to sources using the debugger, as long as you're debugging precisely the same DLL as the one the crash happened on. Go to the Disassembly window.

Generating the method name in the same format as the builtin StackTrace provides is rather straightforward:


static public string MethodNameWithParameters(MethodBase mb)
{
    return mb.DeclaringType.FullName + "." +
        mb.Name + "(" +
        string.Join(", ",
            mb.GetParameters().Select(
                pi => pi.ToString())) + ")";
}


Friday, April 15, 2016

Object Linking and Embedding Database

Recently, I've been messing with a database API I've never used directly before, even though I've been using it all the time. I'm talking about OLE DB, the interface behind ActiveX Data Objects (ADO).

The job is rather simple. Native C++ application, MS SQL Server database, performance is a priority, so no managed code. Massive inserts into the database, so I want to use table-valued parameters (TVPs), the most friendly option for those. The ADO interface doesn't support TVPs, while OLE DB does. So OLE DB it is.

Turns out, while the OLE DB API per se is not badly documented, the body of samples and discussion on the 'Net is rather poor, so I thought I'd share some pitfalls I've spent time in along the way.

When you're calling a stored procedure that returns both recordsets and output parameters, the latter only appear in the bound variables when the recordsets have been scrolled through and released. In fact, it's the Release() call on the recordset that triggers output parameter filling.

When you want to provide an empty table as a TVP, bind both parameter value and status via the DBBINDING structure, and provide the value DBSTATUS_S_DEFAULT in the bound status variable, as opposed to DBSTATUS_S_ISNULL. If you provide the latter, the stored procedure call will fail, and the value of the status variable on completion will be DBSTATUS_E_BADSTATUS.

A useful pattern is generating an accessor handle for a procedure, and calling it multiple times with the same accessor. When doing so, watch for parameter status values changing after the ICommand::Execute() call. In my case, a variable was too big to fit into a parameter, and the status was reset to DBSTATUS_S_TRUNCATED. There was no error on the procedure call - the Execute() method returned S_OK. However, the next call to Execute() did fail, since DBSTATUS_S_TRUNCATED is not a valid status value upon input.


MurmurHash in Transact-SQL

UPDATE: it's now a Gist.

Sometimes, one needs an intermediate strength hash. Something less collision prone than CRC32 or Adler-32, but smaller and less costly than MDx or SHAxxx. My favorite for a while has been MurmurHash64 - it produces a 64-bit value, just the right size to fit into a scalar variable in a C program, or a bigint database field.

There are several flavors of MurmurHash; I'm partial to the one called MurmurHash64B. MSSQL is capable of storing 64-bit variables, but it doesn't support unsigned integer types, and treats arithmetic overflow as an exception. So the B flavor, which was originally geared towards 32-bit machines, is a better fit than A.

The official home of MurmurHash is at Github. Over there, one can find an implementation in C.

I'll just leave it here. An implementation of MurmurHash64B in Microsoft SQL Server's Transact-SQL. Tested with MSSQL 2008 and 2014.

create function [dbo].[MurmurHash64B]
(
    @s varbinary(max),
    @seed bigint = 0
)
returns bigint
as
begin
    declare @len int = datalength(@s), @i int = 1
    declare @m bigint = 0x5bd1e995
    declare @h1 bigint = @len ^ (@seed & 0xffffffff)
    declare @h2 bigint = 0
    declare @k1 bigint, @k2 bigint

    while @len >= 8
    begin
        set @k1 = cast(convert(binary(4), reverse(substring(@s, @i, 4))) as int)
        set @k1 = (@k1 * @m) & 0xffffffff
        set @k1 ^= @k1 / 0x1000000
        set @k1 = (@k1 * @m) & 0xffffffff
        set @h1 = (@h1 * @m) & 0xffffffff
        set @h1 ^= @k1;

        set @k2 = cast(convert(binary(4), reverse(substring(@s, @i+4, 4))) as int)
        set @k2 = (@k2 * @m) & 0xffffffff
        set @k2 ^= @k2 / 0x1000000
        set @k2 = (@k2 * @m) & 0xffffffff
        set @h2 = (@h2 * @m) & 0xffffffff
        set @h2 ^= @k2
        set @len -= 8
        set @i += 8
    end

    if @len >= 4
    begin
        set @k1 = cast(convert(binary(4), reverse(substring(@s, @i, 4))) as int)
        set @k1 = (@k1 * @m) & 0xffffffff
        set @k1 ^= @k1 / 0x1000000
        set @k1 = (@k1 * @m) & 0xffffffff
        set @h1 = (@h1 * @m) & 0xffffffff
        set @h1 ^= @k1
        set @len -= 4
        set @i += 4
    end

    if @len >= 1
    begin
        if @len >= 2 
        begin
            if @len >= 3
                set @h2 ^= ascii(substring(@s, @i+2, 1)) * 0x10000
            set @h2 ^= ascii(substring(@s, @i+1, 1)) * 0x100
        end
        set @h2 ^= ascii(substring(@s, @i, 1))
        set @h2 = (@h2 * @m) & 0xffffffff
    end

    set @h1 = ((@h1 ^ (@h2 / 0x40000)) * @m) & 0xffffffff
    set @h2 = ((@h2 ^ (@h1 / 0x400000)) * @m) & 0xffffffff
    set @h1 = ((@h1 ^ (@h2 / 0x20000)) * @m) & 0xffffffff
    set @h2 = ((@h2 ^ (@h1 / 0x80000)) * @m) & 0xffffffff

    declare @h bigint
    if (@h1 & 0x80000000) <> 0
    begin
        set @h1 &= 0x7fffffff
        set @h = (@h1 * 0x100000000) | 0x8000000000000000
    end
    else
        set @h = @h1 * 0x100000000

    set @h |= @h2

    return @h
end