Friday, March 13, 2020

Who wants to meet Sayaka-san?


And now, for something completely different - speech synthesis in Windows (AKA SAPI).

Windows 10 comes with a built-in speech engine. In fact, it comes with two - SAPI 5.4 proper, and SAPI 5.4 OneCore (four on a 64-bit OS). They all coexist side by side. The data files are largely shared.

My specific interest comes from a desire for spoken Japanese language. One can install the Japanese text-to-speech (TTS) voices in Settings, under Time and Language/Speech. Once you do, there are three Japanese voices in the list - Ayumi, Haruka, and Ichiro. Together with the three English voices, six total.

That's the Windows settings, but what about SAPI applications? Turns out, SAPI 5.4 gets only Haruka, while OneCore get all three. So going forward, my interest was limited to OneCore. The more voices, the merrier.

SAPI stores its voice list in the registry - different keys for SAPI 5.4 proper and 5.4 OneCore. Yet if you look at the registry key HKLM\SOFTWARE\Microsoft\Speech_OneCore\Voices\Tokens, you will see seven subkeys. The seventh one is Microsoft Sayaka. Why isn't she present in the list and why can't applications find her?


I've been poking around SAPI, trying to point it directly at the Sayaka registry key, encountering SPERR_NOT_FOUND, until I ran my program under Process Monitor. Turned out, my process was querying the registry in a completely different place:
HKCU\Software\Microsoft\Speech_OneCore\Isolated\7WUiMB20NMV5Y7TgZ2WJXbUw32iGZQSvSkeaf0AevtQ\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_OneCore\Voices\Tokens\

So it seems like SAPI OneCore (unlike SAPI 5.4 proper) isolates its settings. What would be the unit of isolation? I've created a brand new project, copied the same code into that one and ran again. And it got a different isolation cookie - not 7WUiMB20NMV5Y7TgZ2WJXbUw32iGZQSvSkeaf0AevtQ, something else.

So SAPI creates isolation profiles on the fly, based on the identity of the current executable. But the voices in the isolation profile must come from somewhere, from some master list. Turns out, the master list of Japanese voices for SAPI is in the XML files under C:\Windows\System32\Speech_OneCore\Common\ja-JP. On 64-bit Windows, there's another instance under SysWOW64. The one file that's present there only lists Ayumi, Haruka, and Ichiro. No Sayaka in sight. The contents of the XML pretty much match the registry settings.

My second avenue of exploration was - can we copy the registry settings for Sayaka to the isolation key? Turns out, yes. I've copied the whole Sayaka key from HKLM\...\Voices to my app's isolation key, and SAPI OneCore would find her and let her speak. You can access the voice by a hard-coded ID, or via enumeration. She's got a pleasant voice.

But that couldn't be the endpoint. After all, leveraging Process Monitor is not something I'd recommend to either end users or application developers, and how else would you know the location of the settings isolation area? So I've set out to find a way to figure out the isolation cookie without SAPI's help.

Since the debug symbols for sapi_onecore.dll are out on the Microsoft Symbol Server, it was easier than I thought. There's a function CSpRegistryIsolator::GenerateUniqueRegKeyName, guess what it does. First, it calls GetModuleFileNameW with a zero module handle (spoiler: that's the current executable). Then it calls a function called SHA256. Then, Base64Encode.

The only wrinkle to this straightforward scheme is - rather than passing the length of the filename buffer in bytes to the hash function, they pass the string length in wide characters. As a result, it only hashes the first half of the file name. Looks like someone forgot to multiply their wcslen() by 2.

Also, there's more than one flavor of Base64. SAPI uses the filename safe Base64 - digits 62 and 63 are - and _, respectively, and there's no padding with = signs.

Having discovered all that, I started to ponder the wisdom of relying on undocumented implementation details, such as this half-baked hash/encode scheme. My next key insight was - SAPI has functions for creating and populating voice tokens. And the isolation area is writable without elevation - otherwise, how would SAPI be able to create it on the fly?

So here's my best solution so far: if Sayaka is not registered, create a token for her by SAPI means, and populate with values and attributes from Sayaka's home under HKLM. This is perfect for specific applications that want to hear her talk.


But what about that business with the master list of voices? Turns out, it's not that easy to add an XML file to that folder. It's off limits even to administrators - only TrustedInstaller can modify it. Fortunately, there are quite a few utilities that let one impersonate it. I've used the one called PowerRun. Once I've copied the Sayaka XML to both master lists (32- and 64-bit), it was visible to the SAPI OneCore applications, even the ones that already had an isolation profile in place.

This approach is for people, not programs. It will work even for those SAPI apps that have no prior knowledge about Sayaka. Impersonating TrustedInstaller requires administrator permissions, naturally.


Having said all that, I think Sayaka was badly mistreated by Microsoft. I've created a petition to make her a first class citizen. Please go there and upvote! Windows 10 only.

The original investigation diary, as well as some useful code snippets, are here at StackOverflow. I've erased the SAPI isolation cookie generation sample - it worked, but it was too hacky for my taste. The cookie recipe is here, use at your own peril.

No comments:

Post a Comment