Wednesday, March 26, 2014

SEH for fun and profit

PREAMBLE: this technique is no longer necessary. As of Visual Studio 2015 Update 2, one can mix C++ exceptions with SEH.

The built-in crash reporting of Windows Phone 8 sucks. It doesn't report registers, stack, what modules were loaded at crash time... Definitely not something you can debug from.

In the managed world, things aren't so bad. There's an event called UnhandledException in the Application class that one is free to catch and process. No local variable access (that I know of), but at least you get to dump globals and the managed stack.

Enter native world. When native code crashes, it does so hard. Access violations, illegal instructions - the fun stuff. Some native exceptions are reported to the managed world as instances of SEHException, but not all of them, and the relevant debugging details are lost anyway.

I haven't so far found a way to catch all native crashes in my app, but here's the recipe for catching them on per-method basis. The ingredients are twofold: C++ lambdas and good old C-based structured exception handling.

In a typical mixed mode WP8 app, points of entry into the native world happen to be WinRT class methods (COM methods deep inside, but that's not the point). The idea is to wrap all method's functionality in a lambda with no parameters and no return value, and pass said lambda to a SEH-aware wrapper. The latter should be written in C - the compiler won't let you mix SEH exceptions with C++ exceptions.

The SEH-aware wrapper would call the underlying lambda, and if it crashes, execute some kind of a native exception handler, passing the debugging details. The handler can't do much, but at least it can dump a log into isolated storage.

On the consumption side, the method that used to be

public ref class CMyClass sealed
{
public:
    int MyMethod(int i, String ^s)
    {
        int Result = DoSomething(s);
        DoSomethingElse(i);
        return Result;
    }
};

would instead become:

int MyMethod(int i, String ^s)
{
    int Result;
    SafeCall([i, s, &Result]()
    {
        Result = DoSomething(s);
        DoSomethingElse(i);
    });
    return Result;
}

The idea is that SafeCall() would catch native exceptions, if any, and log them for subsequent submission and analysis. Let's get to the implementation of SafeCall.

There are many ways to skin this particular cat. They all involve some degree of type unsafety that C is so famous for; I've tried to keep it to a minimum. In my implementation, type unsafety is only introduced on the C/C++ border. Since all lambdas are different, unrelated datatypes at compile time, some template magic is in order. Here goes the H file to be included wherever SafeCall() is being employed (it's called SafeCall.h in my app):

extern "C"
{
extern void SafeNativeCall(void *);

//Implemented in C; that's where the SEH is
 

extern void CallFunctorWrapper(void *pfw);
//Entry point back from the C side
}

class CFunctorWrapperBase
{
    friend void CallFunctorWrapper(void *);
    virtual void Call() = 0;
};

template<typename TFunctor>
class CFunctorWrapper: public CFunctorWrapperBase
{
    const TFunctor &m_Functor;
    void Call()
    {
        m_Functor();
    }
public:
    //Interface for the SafeCall function
    CFunctorWrapper(const TFunctor &f)
        :m_Functor(f)
    {

    }
};

//This is the interface for callers.
template<typename TFunctor>
void SafeCall(const TFunctor &f)
{
    SafeNativeCall(

        static_cast<CFunctorWrapperBase*>(
            &CFunctorWrapper<TFunctor>(f)));
    //Calling into the C world.
    //Typecast to make sure class pointer to void* and back
    //conversion is proper.
}
So SafeCall would call SafeNativeCall(), which is defined as extern "C" and takes a void* parameter. That's where the SEH takes place. The CPU context record is not available to an exception handler, but it's available to an exception filter, so we capture it there.

The following should be a C file, not CPP.

#include <windows.h>

extern void CallFunctorWrapper(void *p);

//The gate back into the C++ world

extern void OnCrash(DWORD Code, void *Address, CONTEXT *Ctxt); //Crash callback; implement it in C++ as extern "C" if you wish

DWORD ExcFilter(DWORD Code, LPEXCEPTION_POINTERS ep, CONTEXT *Ctxt, void **Address)
{
    if(Code == EXCEPTION_ACCESS_VIOLATION ||

       Code == EXCEPTION_DATATYPE_MISALIGNMENT ||
       Code == EXCEPTION_ARRAY_BOUNDS_EXCEEDED ||

       Code == EXCEPTION_INT_DIVIDE_BY_ZERO ||
       Code == EXCEPTION_INT_OVERFLOW ||
       Code == EXCEPTION_PRIV_INSTRUCTION ||
       Code == EXCEPTION_ILLEGAL_INSTRUCTION ||

       Code == EXCEPTION_STACK_OVERFLOW)
    {

        //Capture them now. There won't be another chance
        *Ctxt = *ep->ContextRecord;
        *Address = ep->ExceptionRecord->ExceptionAddress;
        return EXCEPTION_EXECUTE_HANDLER;
    }
    else
        return EXCEPTION_CONTINUE_SEARCH;
}

void SafeNativeCall(void *p)
{
    CONTEXT Ctxt;
    void *Address = 0;
    __try
    {
        CallFunctorWrapper(p); //Calling back into the C++ world
    }
    __except(ExcFilter(GetExceptionCode(), GetExceptionInformation(), &Ctxt, &Address))
    {
        OnCrash(GetExceptionCode(), Address, &Ctxt); //Oops...
    }
}



The final bit of the puzzle is the implementation of CallFunctorWrapper. Its only purpose is to let C call a virtual method of a C++ object. It's implemented in a C++ file like this:

#include "SafeCall.h"

extern "C" void CallFunctorWrapper(void *pfw)
{

    //Doubling back on the void* cast of above
    static_cast<CFunctorWrapperBase*>(pfw)->Call();
}


The implementation of OnCrash() is another big topic; I won't go into that here. In my app, it dumps everything it deems relevant (stack, registers, app version, module info, some globals) into a text file in isolated storage, then terminates the app. On the next run, the app asks the user if they want to send a crash report home. Unlike the crash catching portion above, crash reporting would be somewhat architecture dependent - the contents of the CONTEXT structure vary between Intel and ARM. On the simulator, the former works, on a real device - the latter.

Naturally, on the other end of the submission there's a Web service and eventually a bug tracker.

The whole idea could've been implemented in less code (but with more sketchy pointer fiddling). Another avenue for improvement would involve parametrizing by return type, so that one doesn't have to capture the return value in a variable and may instead write:

return SafeCall([]()
{
    return 17;
}); //returns 17

Please don't read this as an advice to continue execution after a crash. All kinds of consequences might ensue if you do.

No comments:

Post a Comment