Sunday, December 31, 2017

Lightning fast

Last time I've revisited Google Play order processing, there was a 6 hour gap between order submission and the card being charged. The delay was artificial, naturally. I'm not exactly sure what was the reasons for such a design, probably, that was an allowance for return and refund; if the customer's remorse kicks in before the card is charged, Google doesn't have to pay the card transaction fee.

Effective July 2017, the gap is no longer on the order of hours; now it's 5-7 minutes.

Friday, December 15, 2017

Team Foundation Server schema

I happen to run an on-premises instance of Microsoft Team Foundation Server for a medium sized software shop. TFS has pretty good reporting capabilities, but out of the box, almost no cross-collection reporting. Fortunately, those who are blessed with admin rights in TFS get to connect to the production database server.

The schema of TFS databases (there are multiple) is occasionally convoluted, but generally approachable. Each collection gets a database, and the server-level information is stored under Tfs_Configuration. If you've dabbled with TFS REST API, you'd know that both collections and projects are identified with GUIDs in addition to their regular names.

The list of team collections is in table Tfs_Configuration.dbo.tbl_ServiceHost. The field HostId corresponds to the collection's  GUID.

The list of projects is in table dbo.tbl_projects in each collection's database. The GUID is under project_id. Table dbo.tbl_Project doesn't have the GUID, just the dataspace ID.

The build/release definitions and queues are stored on per-project basis, but there's no project ID there. Instead, there's an integer field DataspaceId which can be linked back to the project via the dbo.tbl_Dataspace table. In that table, the field DataspaceIdentifier (distinct from DataspaceId!) contains the project GUID. Dataspaces have a string type, stored in DataspaceCategory; e. g. the agent queue dataspace corresponds to type "DistributedTask", and release definitions belong to dataspace of type "ReleaseManagement".

The table Release.tbl_DefinitionEnvironmentStep doesn't contain steps. It contains approvals. Same goes for Release.tbl_ReleaseEnvironmentStep. The latter stores the approvals received during execution, the former stores the configured approvals.

References to users and groups are stored in collection tables as GUIDs. The GUIDs are collection specific; in order to resolve them, use the table db.tbl_IdentityMap. The field localId corresponds to a collection-specific GUID, masterId is the global GUID.

In order to resolve the masterId further to the actual group or AD user, use either table Tfs_Configuration.dbo.tbl_Group or Tfs_Configuration.dbo.tbl_Identity. Even collection- and project-level groups can be found in the former. The latter one stores references to the AD accounts.

The table tbl_Group contains both server-, collection-, and project-level groups. In order to retrieve the collection and/or project, use the InternalScopeId field. It's a reference to Tfs_Configuration.dbo.tbl_GroupScope. The server level scope has a hard-coded ID 1.

Friday, December 1, 2017

Who does that?

Amazing discovery of the day: Microsoft Excel respects Scroll Lock.

Thursday, November 2, 2017

Abusing COM for tightly coupled process interaction

Twice in my career, I had to deal with unreliable third party algorithm libraries in a server situation. There's a service type program that follows a general request/response pattern. Processing a request involves calling a third party library that I don't control and that crashes far too often for comfort. The service must survive the crash, log it, and emit an error response.

Both the server and the library are native code, so a global try/catch around the library call is not really an option. So this calls for a dispatcher/worker architecture; the service receives requests and routes them to worker processes, one request at a time. If a worker crashes, the service will know and act accordingly.

One of the projects where I had to deal with this was on Linux; that's a story for another day. The other one was on Windows, and that's what I would like to discuss.

So, dispatcher/worker communication in Windows. It all hinges on the choice of an interprocess communication mechanism. I'd like an IPC that:

  • Reliably detects server crashes
  • Is message-based as opposed to stream-based
  • Has a built-in datatype marshaling logic

Component Object Model (COM) comes to mind. The worker program would be the COM server with a single object, the dispatcher would instantiate the server object and call its methods. Each request translates into one or more COM method calls. Server crash detection - check. Built-in marshaling - check. But there's a wrinkle. A COM out-of-process server is not supposed to run multiple instances. Here's how COM usually works:

  • A server executable is listed in the registry under the CLSID
  • A client calls CoCreateInstance() with that CLSID
  • The run-time starts the server executable
  • The server executable calls CoRegisterClassObject() for the CLSID
  • The run-time calls the object factory

If a subsequent call for the same CLSID comes, the run-time would reuse the same server process rather than starting a new one.

Also, the loosely coupled nature of COM is a bit of an overkill for my scenario. I never meant to expose my worker program to clients other than my dispatcher. The whole COM machinery for making servers exposed and user friendly to third party clients is irrelevant to my case.

So, how can we have a COM client creating multiple, identical COM objects running in different processes? Running Object Table (ROT) to the rescue. COM servers can publish their objects in a global repository, identified by arbitrary monikers. So the idea is:

  • The dispatcher starts multiple worker processes
  • Each gets a unique integer parameter (a cookie) via the command line
  • The worker registers an object in the ROT, identified by the cookie
  • The dispatcher retrieves that object
This is different from the regular object creation protocol. The worker program has no object factory (since it's only running exactly one object). There's no need to register the server in the registry. The object needs no CLSID. As for the interface, in my case, I'd use raw IDispatch, so there's no need for any marshaling code, either. My dispatcher/worker exchange protocol can be perfectly served by passing an array of VARIANTs both ways.

The only addition to that protocol is that the dispatcher needs to know once the worker's COM object is available in the ROT. I did that with a named event object, where the name contains the cookie. Once the worker starts up and registers its object, it would set the event. Maybe a short sleep on the dispatcher side would accomplish the same, but this is both faster and safer.

In my case, the worker can only process one request at a time, so the worker doesn't need to be multithreaded. So the CoInitialize() call in the worker would specify COINIT_APARTMENTTHREADED, and then the worker's WinMain() would have to run a message loop.

Now, dealing with the worker process crashes. There can be three kinds:
  1. During process startup
  2. During the request
  3. Between the requests
The first one is rather easy. I mentioned that the dispatcher starts the worker and waits for the "I'm ready" event. Make that a wait for two objects, the event handle and the worker process handle, and see if the process terminates before the event is set. If it does, that means a startup crash.

If the process crashes during the request, COM will report an error. The question is, which one? After some extensive testing with numerous crashes, I've identified the following HRESULT values:
Either of those means the server process terminated during the COM call, one way or another.

If the process terminates between requests, the next COM call would return HRESULT_FROM_WIN32(RPC_S_SERVER_UNAVAILABLE).

What are the quit conditions for the worker? One can implement a "please quit" method, and tell all workers to quit during dispatcher shutdown. A worker might quit when the client disconnects (e. g. the main object is released to zero). It may quit after a timeout of inactivity. In my implementation, I'd pass the PID of the dispatcher to the workers, and made them quit if the dispatcher terminates. The message loop becomes a MsgWaitForMultipleObjects() loop, with the objects being the parent process handle.

Rather than post the code here, I've published it as a Gist. The gist contains a sample worker, a sample dispatcher that creates several threads and calls the worker on each of them. The worker crashes at random with an access violation, but the dispatcher is handling that gracefully.

There's no ATL dependency in the project. There's a Native COM reference in the dispatcher, but that can be easily avoided, if dependencies are a problem. I've compiled it with Visual Studio 2017. Dispatcher is meant to be a console project (it prints some lines), while Worker is a Windows GUI one (it has a message loop).

To summarize, this is a fun little way of making COM dance. No registration, no typelibs, no proxy-stub machinery, no ref counting. Just the bits that we like - reliable interprocess communication, cross-process error handling, friendly passing of simple-typed parameters as VARIANTs.

Monday, April 3, 2017

The saga continues

Remember, some time ago Google has removed the order amount in USD from their Merchant Console? Ever since, I've used the earnings report at the end of each month to capture that data item.

Two months in, starting with the March 2017 earnings report, that's not an option, either. The order amount is there, so is the transaction fee, but the order ID isn't.

At least the support representative seemed to agree that it was a bug.

UPDATE: they've fixed it without any announcements.

Monday, February 6, 2017

Defeating the IAP emulator

A few posts ago, I've mentioned a certain Android app that emulates valid in-app purchases on rooted Android devices. I also mentioned that this app goes as far as shorting out the digital signature check code, so that apps that do due diligence and check the IAP signature against the Google public key are fooled, too.

I've been suspecting all along that the emulator does this by tapping into the Android system library, so that the built-in signature check function returns true regardless. That seems to be the case. The emulator struck again, but this time, my app had two signature checks - the system one and a homegrown one. And the latter one was the one that correctly reported a signature mismatch.

Normally, I'd be the first one to recommend against reimplementing crypto primitives. But in this case, I do feel it's justified. Here's the code. SHA1 hashing is system provided, but the RSA signature check bits are custom. The function and its parameters are deliberately called vague names, just in case the pirate crowd goes through the trouble of introducing special-case processing for my case.

Friday, January 27, 2017

Meet the new boss, same as old boss

Google is full of surprises, aren't they?

Less than three months after they've unveiled the new, redesigned Payments Center, they've discontinued it and moved the functionality to the Play developer console. And it's not like they've moved the same pages to a different host; this is a redesign, both the internals and the UI are noticeably different, with some functionality removed, and some new bugs introduced.

On the brighter side, the new UI is more scrape friendly. It's still JavaScript-driven, with explicit protection against HTTP-only scraping. On the other hand, there's a very straightforward AJAX call that returns JSON with almost all I need to capture the order activity.

There's a glaring exception though. The November 2016 version of the Payments Center would expose a crucial number - estimated revenue in USD (more generally, the payout currency). Not anymore. Unless the order was in USD to begin with, there's no way of knowing what's my take until the end of the month.