State of the Union: Application Logging

Over the last week, I've been asked to help investigate some issues with a new obfuscation process that our builds will now go through before deployment (we can save the rants about obfuscating code for some other time). Per the norm for this activity, the test group ran into issues in a section of code and needed help investigating. As I dug into the issue, I turned to the log files and found that they were absolutely worthless. While we suspected that the obfuscation processor renamed or removed a class, we couldn't find the telltale signs of such an activity - no exceptions at all.

After an hour tracing the source code, I found what seems to be a common pattern in the code:

    // do something potentially dangerous
catch( Exception e )
    jobDone( 567, "bad request" );

What's wrong here? The code does prevent an exception from killing the long running server process and it does fulfill the component API by returning an error code. But what was the root cause of the error? This is a simple oversight that I see way too much. This code was written with the assumption that a message will be logged before throwing an exception but the code encountering an error isn't something written by us. It's the spring framework and that code doesn't know anything about our logging configuration. Instead of being able to point at the problem and fix it, I needed to add some logging code, spin another build and then rerun the test. We eventually saw the error (turns out the bean definitions no longer matched the byte code), but this highlighted some very real problems with logging in server side application development.

After years of trying to work with applications there seem to be some consistent problems that always occur. Here's a quick a dirty laundry list that I've seen over the years and I'm sure that I've contributed to this list more that I'd like to admit.

In the last 10 years, the industry has made great strides improving web frameworks, scalable data access solutions and large volume data processing but we haven't moved very far in terms of helping ourselves with better logging. What we need is a logging framework that provides a rich declarative structure for defining some more complex logging rules and isn't dependent on developers to remember to invoke it throughout their code. Stay tuned for some random thoughts on what I think that might look like in practice.