Did you say I don’t like Blue Screens?

September 3rd, 2007 - Fernando Roberto

I’m not saying I love them. Blue screen is a sign that something went wrong but, it is better you see it than a customer calling you saying that he saw one. So we need to do our best to make it make it appear while your driver is under test. I’ve seen programmers escaping the blue screen just trying to hide themselves behind an exception handler. I do not need to say this is not that useful. This way, you just defer to find mistakes that you or someone else will find eventually. Computers that are using your drivers may have a sudden reboot here or there. But even blue screens can happen, it’s all a matter of luck; however, your customers can realize that using your drivers makes them too unlucky. In this post, I’ll give you some tips about how we can see more blue screens in our test drivers.

Going through with F8

Do not take this as “Test Yourself,” but take it as testing all the returning codes that you can. Remember that if something gets wrong, it is not just a MessageBox that will appear, but usually, everything ends up in “blue”, sooner or later. Let’s hope for “sooner”. I often use to say that every code deserves, at least, an F8 walkthrough, which is my Step Into key. Yeah, I know it’s different of the usual one, but when I started using Visual C/C++ 1.52, its keyboard layout was it. But back to the point, once I have made a function that should simply read a file, I went through the code with F8 and received a return code that was not STATUS_SUCCESS, but passed the NT_SUCCESS() macro. The haste induced me to ignore it. It was not an error, it was just a warning. Weeks later, the test team told me that, for any reason, the driver was returning garbage when reading the file. A bit of debugging showed me that same returning code. Only after having loked at its definition in the file ntstatus.h I could understand everything.

//
// MessageId: STATUS_PENDING
//
// MessageText:
//
//  The operation that was requested is pending completion.
//
#define STATUS_PENDING                   ((NTSTATUS)0x00000103L)    // winnt

During debug sessions, the system has enough time to do the I/O that was deferred previously; but, freely running the code, the story can change.

A code that has passed on its F8 test can run in various existing and imaginable environments. Therefore, this first phase is just to remove the grosser errors. After that, it is the test team should beat the victim. There are people who really have a gift to test software. I had worked at a company where the person who used to test the products should have had a personal problem with the software that we produced (or with us, go figure!). I could test the software for days, but when I delivered it for testing, it did not take half an hour for him to call me back saying that phrase that has become his trademark: “Too bad !!!”. There was no explanation. We used to say that his PC had been formatted over an Indian burial ground. No wonder that developer’s testing is so frowned upon.

Using ASSERT

I imagine most of you have already heard from ASSERT macro. This macro is intended to ensure that a certain condition is true. If the condition is false, a blue screen just pops up. Wow, what an incredible macro that helped us very much! Yeah, I know, it is not quite that. Actually, this is the behavior we would have if the debugger was not attached to the system. Then, that macro would be perfect only in Checked. Do I must remind you that  Checked means Debug and Free means Release? Well, already done. If we take a look at its definition, we will see that the obvious thing has already been thought.

#if DBG
 
#define ASSERT( exp ) \
    ((!(exp)) ? \
        (RtlAssert( #exp, __FILE__, __LINE__, NULL ),FALSE) : \
        TRUE)
...
 
#else
 
#define ASSERT( exp )         ((void) 0)
 
...
 
#endif // DBG

But what happens if we have the debugger attached to the system? Good question, this is a really interesting question; I’m glad you have asked it. Has anyone ever told you that you have some knack for programming? Anyway, I have changed one of our examples to enforce this condition.

extern "C"
NTSTATUS DriverEntry(IN PDRIVER_OBJECT pDriverObj,
                     IN PUNICODE_STRING pusRegistryPath)
{
    //-f--> I think I'm sure that this count is right.
    ASSERT(1 + 1 == 3);

If the condition fails and we have a kernel debugger attached to the system, it will display the condition that had failed at the output window and it will request us an answer among the  four alternatives, as it is shown below.

As you can see, the ASSERT is practical, easy and not fattening. It just requires a bit of brain, as well. I’m sying this because we’ve all had bad days, and after 11:00 pm, no programmers should answer for any produced code. Once, it took me some time to figure out why the code below was not working properly. Although everything was working perfectly fine when compiled on Checked, it seemed to me that a certain function simply was not being called in Free. If you take one more looking at the definition of this macro, you will see that the condition disappears when compiled in Free, and in this case, the call too.

    //-f--> Programming is just not enough, thinking is required.
    //      << Don't copy this >>
    ASSERT(DoSomeThing() == STATUS_SUCCESS);

A system in Checked

Wouldn’t it be nice to have an entire operating system full of ASSERTs and tests to detect the slightest sign of trouble and on finding one, the system would present us with a nice blue screen? Well, you’ve probably heard of the Checked Build versions. They are exactly what I have just written. An entire operating system built on Checked. This means that all of ASSERTs that were at the sources have been included in the final binaries and they are checking the system for you. It may seem silly to have to install your driver on one of these systems, but believe me, it is worth. I’ve had drivers that worked very well for months until my manager suggested that they should be tested on Checked versions. From the top of my arrogance, I thought to myself: “I see no reason for that”. Well, we are always learning as we are living. The test machine has not even has started with my drivers. Several blue screens were shown, one after another. Checked Build versions are still able to check for deadlocks. Could you guess what would happen if a spinlock was held for more than a certain time? I bet you do.

Since Checked Build versions are foolproof, I can use them as my default system? Actually you can, but everything is much slower, thousands of checks are being made all the time and the code has no optimization applied. In a fresh installation without installing any additional software, you can find ASSERTs at Internet Explorer or other programs. That’s right, not just the kernel is Checked, but the entire system is as well. You always have to let a kernel debugger attached to your system, because the slightest hint of smoke would be more than an enough reason for one more blue screen.

If my driver has passed by the Checked Build, then is the driver perfect? Sorry to disappoint you. Remember that the performance impact caused by so many tests can hide problems like race conditions. The ideal is to test your drivers in both versions. One setting allows us having only the system image and HAL in Checked, while the rest of the system stays in Free. This may provide you with additional tests in the kernel while the rest of the system runs the lighter version. This link explains how.

OK, is my driver beautiful now?

Actually, this still is only the minimum you should do. Another excellent tool for generating blue screens is the Driver Verifier. This application works in conjunction with the system to track actions from a list of drivers composed by you. From Windows 2000, DriverVerifier comes already installed. Try it now! It doesn’t require any skill or even practice. Type “Verifier” at the Windows “Run…” window and a simple Wizard starts. As I intend to finish this post still alive, I won’t describe how to use this tool step by step, but there are several links on the product page that can help you with this.

OK, now I know you really love Blue Screens!

To get an idea of how importnat a search for blue screens is for a software company, Microsoft promotes IFS Plug Fest in order to test the interoperability of different products that implement File Systems or filters for them. The event is for free, but each company must pay for travel expenses and lodging of its representatives. In these meetings, the professionals around the world gather together to do the tests with each other and they can get in touch and ask possible questions concerning this complex model of development. There are also sponsored seminars that discuss common problems faced by this community. I have not been at one of these, but maybe one day I will be able to.

Stay calm, there are even more

Beyond the mentioned tools, there are some others that I could not forget mentioning; however, I won’t take long on explaining how each of them works.

  • Prefast – A tool that analyses the source code to be compiled by the DDK, looking for usual programming mistakes for Kernel drivers.
  • Static Driver Verifier – That tool makes a binary analysis on the driver file already built. The test takes longer but, it detects more defects than using Prefast.
  • Hardware Compatibility Test – A set of software to apply tests to get drivers certified by Microsoft. It was discontinued a while ago.
  • Driver Test Manager – That’s the new test software used to get your drivers certified.

After so many blue screens, only drinking some coffee can help us!.
CYA!

Leave a Reply