Sunday, July 29, 2007

Geeky...

Those unfamiliar with the C/C++ programming language need not read any further.

So there's this legacy application, at least 12 years old, that we're trying to slightly modify and release a new version. The process involves migration from an old version of the Microsoft C compiler to the most recent one, the one that comes with Visual Studio 2005.

Now, the app in question is a database-driven app written with DB-library. DB-library is a very archaic C-based (i. e. non-object-oriented) API for MS SQL Server access - predates even ODBC, let alone ADO or OLE DB. One of its features is a callback function support for error handling - when you connect to the database, you supply a function pointer to a function with well-known prototype, whenever a database error happens, DB-library would call that function with error parameters. All nice and clear.

Our little app already has an error handler function. And that function reproducibly crashes the process (causes an access violation) on a most innocent-looking line. The line goes like this:

if(errno == 12000)
{ //..do something...


A quick trip down the memory lane reveals that errno is a well-known global variable, defined in the C run-time library (errno.h, to be precise). It's been around since forever - at least since MS-DOS days, and probably even before. Some run-time library functions - fopen for example - use errno to report extended error information.

A quick examination of Microsoft's header file discovers that errno, in this version of the compiler/RTL, is #defined like this:

#define errno (*_get_errno())

Still, no clue why would this innocently looking code cause an access violation. So it's not simple variable access anymore, it's a function call. The function can be found, the linker does not complain. The access violation happens inside that function - but any attempt to step in with the debugger fails. Usually, the Visual Studio debugger has no problem stepping through the RTL code.

The revelation came when I noticed that errno was, in this particular instance, not a reference to the RTL's global variable. It was a formal parameter of the current function! Remember I mentioned it was all inside of an error-processing callback function? It had a parameter called errno, of type int. And by the magic of #define, this would compile as:

void ErrorProc(int (*_get_errno()), /*more params...*/)
{
    if((*_get_errno()) == 12000)
    //...


See, with this #define in place, the function would have a pointer-to-function parameter called _get_errno - the name matches an externally defined function, but that's legal in C - and the variable access would compile to an indirect function call, using _get_errno as the function pointer! Due to the fact that pointers and integers are the same size on Wintel-32, whatever innocent constant was passed by the DB-Library as the errno parameter was interpreted as a function pointer (!) and dereferenced in the if() line.

It might be worth noting that the syntax above is not how everyone expects an indirect function call to look like. Following K&R, any sensible C programmer would code a call-by-pointer like this:

(*my_func_pointer)();

Still, the syntax above - without the parentheses - is considered legitimate, and apparently compiles. The run-time value of the parameter is something around 12000 - way too low an address to contain a piece of executable code. Thus the crash.

The error is no one's fault. First, errno WAS an honest global variable some time in the past - this code worked fine when compiled with Visual Studio 6; apparently, the change was made after that. Naming formal parameters with names that coincide with well-known global variables might not be a good programming style, but it's still allowed in C. Second, the #define would never break any code that worked with RTL's own errno. It's the confluence of both factors that did the trick.

And that, people, is exactly why #define is considered EVIL.