Archive for February, 2007

Synchronism x Performance

Wednesday, February 28th, 2007

No wonder that my last posts brought issues related to linked lists and performance. In recent weeks, I had been hired by a security company to take a look at one of their File System filters, in order to reduce the delay caused by them. In this post, I will talk about synchronism and CPU contention.

I think it should not be new to many here that a linked list, or any other resource, when shared among multiple threads, should implement some access synchronism control thus, avoiding a thread to read invalid data as a result of a change made by another thread at same time.

But I don’t have two processors, synchronism for what?

It is true that computers with only one processor run only one thread at any given moment. So we never have two threads being executed at the same time. But it is important to remember that threads are executed in small slices of time determined by the scheduler, considering each thread quantum size and its priority. There are ways of avoiding that a thread be interrupted by the Windows scheduler, but at normal temperature and pressure conditions, we do not know when a thread is interrupted giving its place to another thread be executed.

There are several synchronization mechanisms that we can use, but in this post, I will comment specifically about those I’ve been involved in these weeks. But besides these ones, I have to talk about the most exotic I’ve seen in years of experience, which for me was nicknamed “Mutex, pero no Mutcho.” This was a derived class from VMutex of VToolsD. The class enter() method insanely tried to acquire a mutex few thousand times in a loop. If after these thousands of interactions the mutex has not been acquired, then the thread gave up and accessed the shared resource anyway. Do you think the system was having some trouble of Dead Lock? Anyway, it was nice to fix this years ago. There are things we only believe when we see them.

As I have mentioned in another post, File System Filter is a layer placed over drivers like FastFAT, NTFS, CDFS, Network Redirectors and any other drivers that implement a file system interface. That is, all operations related to files, including File Mapping access, pass through File System filters.

If you’re not only interested to know about File System filters, but interested in working with File Systems itself, then you have an obligation to read the Windows NT File System Internals by Rajeev Nagar. This book is the single known reference that is comprehensive enough on this subject. First published in 1997 by O’Reilly, this book is still a mandatory tool for development relative to File Systems even for Windows Vista. Its publication was interrupted a few years ago, and during that time, I saw this book being sold for more than U$ 200.00 for a single used copy on Amazon. Today OSR holds the book copyrights and it is currently working on an updated edition, which should bring issues such as Shadow Copy, Transactional NTFS, Filter Manager, Mini Redirectors and force volume dismount. However this is a job that will take some time, then in 2005, the original version was reprinted to meet this need, while the new edition has been made.

The filter I have worked on maintains several lists to be consulted on each intercepted access. The mechanism used to synchronize access to these lists was the Spin Lock. An obvious choice, but inadequate for this scenario, where the activity is too intense. The list group that is used on IRP_MJ_READ routine is also used at IRP_MJ_WRITE routine and other events. The result is CPU contention. That is, when a thread gets a Spin Lock to do a query on the list, all other threads that need to consult the same list have to sit and wait until the Spin Lock is released (even in cases that we have more than one processor). Knowing that these lists are not small. Can you imagine how slow it was?

Just as world hunger, cervix cancer, the prostate examination and other ills that affect mankind, something should be done about this. Therefore, in a relentless struggle against the evil forces was created the ERESOURCE. (Angelic chants and a dry ice mist is dissipated into the ground)


ERESOURCE is the most appropriate and native way used to synchronize access to structures that are part of a file system, such as FCB (File Control Block), which besides other information, keeps the current file size. With ERESOURCE, multiple threads can access the same linked list at the same time for reading. Assuming that none of the threads would change the data list, then all accesses could be performed simultaneously. In contrast, if necessary, a thread can gain exclusive access to the list in order to make a change.

To use ERESOURSE, you must declare a ERESOURCE variable, which must reside in non-paged memory and 8 bytes aligned, and initialize it using ExInitializeResourceLite() function. For shared access (read only) to the list, you must use ExAcquireResourceSharedLite() function. This function checks for a thread with exclusive access to controlled resource. If so, the function may, depending on a parameter, return failure or wait until the resource to be released. To have exclusive access, use ExAcquireResourceExclusiveLite() function, which analogously checks for threads with shared access to the resource, and optionally, waits until all threads with shared access before releasing the resource for exclusive access. Finally, to release the access granted, either exclusive or shared, use the ExReleaseResourceLite() function.

An interesting point to notice is that the Kernel APC delivery must be disabled for the threads that acquire ERESOURCE. I do not know the real reason for this need, but at least it prevents the thread holding the resource from being terminated by another thread, knowing that the TerminateThread() is implemented via Kernel APC. Thus, the most common way to use these functions is as shown below.

    //-f--> Get shared access (read only).
    ExAcquireResourceSharedLite(&m_Resource, TRUE);
    //-f--> Query the protected values.
    //-f--> Release the resource.

One downside is that most of the functions that deal with ERESOURCE, unlike Spin Locks, should be called with IRQL < DISPATCH_LEVEL. Thus, perhaps some tricks are necessary when handling IRPs and their CompletionsRoutines to deal with this.

After those modifications, the system has gained a lot of performance on the tests I did. The gain will be even greater as we have more operations in parallel, and of course, on computers with more than one processor.

And everybody has lived happily ever after…

Linked lists on DDK

Saturday, February 24th, 2007

Since the beginning of my studies, I have always chosen to study something that would unite computer science with electronics, and this led me to choose the Computing Industrial course at ETE Jorge Street. An excellent school and I learned a lot there. However I have only learned what a linked list is in a professional environment during my internship. Years later, the university introduced me to these lists in form of classes written in Java, where we dealt with references and garbage collectors. Even those who work with Visual C/C++, handles, templates and classes offered by MFC, ATL, WTL and STL, which end up abstracting real linked list implementation. In this post, I will talk a little about resources offered by DDK regarding to this subject.

But I am already a big boy and I already know how to build linked lists in C/C++. Why should I use DDK resources?

If you only store private data, such as a list of buffers to be sent to the device, that’s fine, but to deal with DDK structures, it would be, at least, interesting to know how they are stored in lists. Some situations require you to know how to use DDK lists. An example is: If you implement the custom control queue for IRPs in your driver, the pIrp->Tail.Overlay.ListEntry field will be free to be used while such IRP is held by your driver.

The simplest of these resources is SINGLE_LIST_ENTRY structure that is defined at ntdef.h, as shown below:

typedef struct _SINGLE_LIST_ENTRY {
  struct _SINGLE_LIST_ENTRY *Next;

SINGLE_LIST_ENTRY variable must be set to be the head of our list. This variable should have its Next member initialized with NULL before being used. The PushEntryList() and PopEntryList() routines are used respectively to add and remove elements from the list head. Like most linked lists, nodes are referenced by the Next member until it is NULL, thus indicating the node chain ending.


But wait a minute. All this is beautiful, but isn’t anything missing? Like any linked list structure, among the members of each node, there must be one which points to a similar structure, which is the link to the list next node. In the example below, the pNext member is the one responsible for it.

//-f--> Structure that defines the node list
//      which will be used as an example.
typedef struct _MY_NODE
    //-f--> Your private data that will be defined
    //      by yourself.
    UNICODE_STRING      usSomeData;
    ULONG               ulAnotherData;
    struct _MY_NODE     *pNext;
    PVOID               pMoreData;

But from what we could see at the DDK structure, there are no useful members on it, those ones that contain information that will be kept in the list, like fields as usSomeData or pMoreData. At the structure offered by the DDK, there is only the next node address.

What is the point on saving only nodes?

Actually, the SINGLE_LIST_ENTRY structure as well as other DDK lists structures should be used in conjunction with the CONTAINING_RECORD macro. This macro returns the structure base address that contains a known field in a known address. Well, I have tried to rewrite this sentence three times, but I think you will only understand when you see the example below. Suppose we are using the SINGLE_LIST_ENTRY in our previous example. Thus, we would have the following structure:

//-f--> Structure that defines a register on our linked list
//      and will be used in this example
typedef struct _MY_NODE
    //-f--> Register's private data on the list node.
    //      These fields are not fixed and defined by you.
    UNICODE_STRING      usSomeData;
    ULONG               ulAnotherData;
    //-f--> That member doesn't necessarily need
    //      to be the first or the last one at this structure;
    //      it can be at any position here inside.
    //-f--> More data
    PVOID               pMoreData;

Our structure is basically the same, except the member pNext, which has now been changed to use the structure SINGLE_LIST_ENTRY. Notice that the field doesn’t need to be either at the beginning or the end of our structure. The example below demonstrates how to include nodes in lists composed by SINGLE_LIST_ENTRY nodes.

NTSTATUS PrepareDataAndPush(VOID)
    PMY_NODE            pMyNode;
    //-f--> Here we get the already allocated node
    //      with its fields correctly initialized.
    pMyNode = AllocateAndFillNode();
    //-f--> Testing is never too much.
    ASSERT(pMyNode != NULL);
    //-f--> To put the node at the list, it is necessary to use
    //      the SINGLE_LIST_ENTRY member address.
    PushEntryList(&m_ListHead, &pMyNode->Entry);
    return STATUS_SUCCESS;

At the time to include the node in the list, we use the SINGLE_LIST_ENTRY member address, as suggested by the PushEntryList prototype function. Follow the example below that demonstrates how to get this node using PopEntryList function. This function returns the address passed to PushEntryList function, which is a PSINGLE_LIST_ENTRY. From this address, as I had commented before, we can get the base address of our structure using CONTAINING_RECORD macro. See the example below.

NTSTATUS PopAndProcessNode(VOID)
    PMY_NODE            pMyNode;
    //-f--> Here we get the node from the list.
    pEntry = PopEntryList(&m_ListHead);
    //-f--> Now we can check if a register really was retrieved
    //      from the list.
    if (!pEntry)
        return STATUS_NO_MORE_ENTRIES;
    //-f--> Now we use the CONTAINING_RECORD macro
    //      to get the base address we need.
    pMyNode = (PMY_NODE) CONTAINING_RECORD(pEntry, MY_NODE, Entry);
    //-f--> Now we can access the whole structure.
    return STATUS_SUCCESS;

This same style is also used with the LIST_ENTRY structure, which is the one used in most drivers. This structure allows you to create doubly linked lists, and like SINGLE_LIST_ENTRY, a LIST_ENTRY variable is set to be our list head. Its initialization is done using InitializeListHead() function, which initializes the Flink and Blink members with the list head address.

typedef struct _LIST_ENTRY {
  struct _LIST_ENTRY *Flink;
  struct _LIST_ENTRY *Blink;

The Flink member points to the next node, whereas the Blink member points to the previous node. Unlike most linked lists we know, its endings are not marked by NULL at Flink and Blink members. Instead, their ending points to the address of the node designated as our list head. The example below demonstrates how we could sweep this kind of list, searching a certain node. Before performing any operation with the lists composed by LIST_ENTRY structures, use IsListEmpty() function.

//-f--> Looks for a node through a list
//      composed by nodes using LIST_ENTRY structures.
    PLIST_ENTRY pEntry;
    PMY_NODE    pMyNode;
    //-f--> Before any operation, it is necessary
    //      to check for an empty list condition.
    if (IsListEmpty(&m_ListHead))
        return FALSE;
    //-f--> Get the first node address.
    pEntry = m_ListHead.Flink;
    //-f--> While pEntry is not the head list node address,
    //      it means we have a valid node address.
    while(pEntry != &m_ListHead)
        //-f--> Get the node structure base address.
        pMyNode = (PMY_NODE) CONTAINING_RECORD(pEntry, MY_NODE, Entry);
        //-f--> Check if this is the node we're looking for.
        if (RtlEqualUnicodeString(pusLookFor,
            //-f--> Returning TRUE to indicate the node was found.
            return TRUE;
        //-f--> Get the next node address.
        pEntry = pEntry->Flink;
    //-f--> If we got here, it means that the node we were looking for
    //      was not found. So, returning FALSE to indicate that.
    return FALSE;

NOTE: The above example is only used to illustrate the search for a node. To implement a search like this in a production environment, you must keep in mind issues about multiple thread access synchronism control and reference counters.

To manipulate lists formed by LIST_ENTRY, you can use routines like InsertHeadList(), InsertTailList(), RemoveHeadList() and so on. I will not list all the functions here, I’m sure DDK reference do it better than me.

For both types of lists, there are functions like ExInterlockedXxxList that take a Spin Lock to synchronize changes in the list. Remember that if you synchronize access to lists using Spin Locks, all nodes must reside in non-paged memory. As some of you may know, when acquiring a Spin Lock, the thread IRQL goes to DISPATCH_LEVEL. In this IRQL, Memory Manager is unable to retrieve data page that was paged out to disk, resulting in a BugCheck.

It is also important to remember that you should not mix the use of ExInterlockedXxxList functions with unsynchronized ones. If a list is accessed by multiple threads, all access to it should be controlled.

typedef struct _SLIST_ENTRY {
  struct _SLIST_ENTRY *Next;

The SLIST_ENTRY structure is an alternative to SINGLE_LIST_ENTRY lists with synchronized access. This version aims at being more efficient using the functions like ExInterlockedPopEntryList() and ExInterlockedPushEntryList(). Unlike other structures presented here, the list ending is formed by a different structure of that used on its nodes; this structure is the SLIST_HEADER, which is an opaque structure that should be initialized by ExInitializeSListHead() function.

Other advantage on using this list type is that DDK offers the ExQueryDepthSList() function, which returns the number of nodes stored the list.

If your main problem is performance, then the ideal situation would be to use the group of functions that work with Generic Tables. The list head is defined by an opaque structure called RTL_GENERIC_TABLE. This structure, along with its manipulation functions, such as RtlGetElementGenericTable(), creates binary trees with self balancing. What a chic thing, huh?

But Generic Tables is an issue that deserves a dedicated post. Until next time!

Cool, but what is an IRP?

Monday, February 12th, 2007

My friend Slug, has just read the title of my last post, he laughed and said that most people who read this would ask, “What is an IRP?” Well, thinking about what he said and taking into consideration the a non-simplified explanation DDK reference offers us, I will write a superficial description about IRP, rid my conscience of this weight and be able to sleep again.

Let’s leave the frightening and horrifying DDK diagrams aside to try to see things in a somewhat simpler way. After that, you may refer to the sacred documentation to reinforce the concepts and take any questions about something you have already known what it is, or at least imagined.

Let’s rely on a very practical example. The File Systems drivers for example, NTFS and FAT are implemented as kernel drivers which are called File System Drivers. I imagine that many of you have had the opportunity to open and write to a file.

To start talking to a driver, which we initially make a connection with it, that is done using the CreateFile() function. Contrary to it seems, this function is not restricted to file creation, indeed, the fact of opening a file is a specific way to open the connection to the File System driver (Details about this the next time). The CreateFile function will return to us a handle which will be used to interact with the driver through functions such as ReadFile(), WriteFile() and DeviceIoControl(). By the way, they are not restricted to file operations. When an application calls ReadFile() function, the Win32 subsystem forwards this request to NtReadFile() native API, which in turn, makes the transition to Kernel-Mode and invokes the IoManager, which also in turn, will marshal the request parameters in a structure called IRP (I/O Request Packet).

IoManager sends this packet to the responsible driver, passing through the filters that might be installed on it. An anti-virus driver is a perfect example about the scenario we’re talking about. Anti-virus drivers are implemented as File System Filters. When some application writes to a file, the write IRP gets into the anti-virus before it reaches the File System driver, giving it the opportunity that it needs to check if what is being written contains any known virus signature. For a reading request, the IRP passes through the anti-virus, which installs a CompletionRoutine on it, and thus, it gains access to the IRP’s data when the reading is finished by the target driver.

The IRP is basically divided into two parts, the Header and the Stack Locations. In the IRP Header there is general information, such as status, a pointer to the thread to which the IRP belongs to, user’s buffer addresses, this IRP’s cancellation routine address and so on. In the Stack Locations there are specific request parameters. In case of a file reading, there will be the offset, size and buffer alignment.

Several Stack Locations might exist within an IRP. One for each device belonging to the chain of layers that follow until the IRP reaches the target driver. In other words, there is a Stack Location for the target driver and an additional one for each filter driver installed on it. As the IRP goes down the layers, each driver passes the received parameters from its Stack Location to the one belonging to the next driver using theIoCopyCurrentIrpStackLocationToNext() routine. Once copied, the filter can make the desired changes. If there are no changes being made in the parameters from one layer to another, then the IoSkipCurrentIrpStackLocation() routine you can be used.

The IRPs are going from one layer to another when the filter that received it forwards it to the device in which it is attached, using the IoCallDriver() function. When the IRP gets into the target driver, the driver has basically two options. If the request can be answered immediately, the driver does the desired action and completes the IRP using IoCompleteRequest() routine. But if drivers need to communicate to a device, or even to another driver, the IRP is placed in a queue, it is marked as pending and it will only be finalized when all processing is done.

Returning to our example, when a File System driver receives an writing IRP and depending on many other circumstances, the driver sets the current IRP as pending and creates a new IRP for a volume device representing the partitions, which in turn passes the requests to storage devices. Thus, it is easy to understand that disk drivers know nothing about NTFS or FAT. Storage drivers may have additional storage filters, such as a RAID for disk mirroring among other things. These new IRPs are also created by the IoManager and the whole cycle is redone for each new request.

Watching IRPs

IrpTracker is a tool capable of monitoring IRP’s activities for a particular driver or device. See the figure below where I’m selecting all the devices from Kbdclass driver, which are responsible for reading the keyboard. Notice that my machine has many keyboards. That’s because I’ve been working on anti-key-logger solutions. Selecting all these devices will be easy seeing the IRP’s activities to get the keys you press on the keyboard.

For each key you hit on the keyboard, four lines are issued on the IrpTracker. Each strike key represents two movements, the descent of the key and its release. Each movement is taken with an IRP, which comes with its Completion that means, a line is identified as Call and the other as Comp, representing, respectively, the going IRP and its return. Notice that in this example, the line Comp always comes before the line Call.

Does this mean that the IRP had its completion before being sent to the keyboard driver?

Indeed, the IRP responsible for reading the keyboard gets pended until a key is received. Thus, the system sends an IRP to the driver which waits keyboard events. When this event occurs, the driver receives the key, completes the IRP and the system receives its completion. Soon after, the system launches a new IRP to wait for the next event. Thus, we will always see pair lines, these being the previous Comp IRP and the Call for the next IRP.

If you double click one of these lines, you can see the details of each field in the IRP as shown in the figure below.

I am preparing an evolution driver of Useless.sys, which was used as a starting point in a previous post. The changes will allow it not to be so Useless and be able to demonstrate  the IRPs handling in practice, but it will be for another time.

See you there…

Who owns this IRP? (Process ID)

Monday, February 5th, 2007

There are cases that you need to know which process have launched a given IRP. This is very common in Firewalls or other security programs, which intercept  I/O operations to check in their databases whether a certain process has or not access to a particular resource or service. But how can I know which process an IRP belongs to?

Well, I can start thinking it’s very easy to do. As we know, IoManager gives us the IRPs in the process context that made the request. Thus, knowing the API PsGetCurrentProcessId() we can get the process ID that launched the IRP. See how simple it is:

***     OnDispatchProc
**      A generic name that tells absolutely nothing...
NTSTATUS OnDispatchProc(PDEVICE_OBJECT pDeviceObject,
                        PIRP           pIrp)
    //-f--> I'm thinking it is easy, don't copy.
    HANDLE hProcessID;
    //-f--> Get the current process ID.
    hProcessID = PsGetCurrentProcessId();

Can you see how simple it is to think that everything is right and even being mistaken?

In fact, IoManager delivers the IRPs in the process context of which is doing the I/O. However, what if a third filter got attached to your driver? Yes, it is possible, and when these drivers receive these IRPs, it is not guaranteed that they will transfer them to our driver at the same process context. Suppose we write a driver for a device:

  • An application asks a write operation to the device.
  • IoManager creates and sends the IRP to our driver.
  • A filter attached to our device receives the IRP.
  • The filter performs an asynchronous task, possibly using ExQueueWorkItem().
  • While this task has not being ended up, the filter will mark the IRP as pending and return STATUS_PENDING.
  • The asynchronous operation, in this case, is performed by a system thread and at the end of the task, the filter forwards the IRP to the driver below it that in case, it is our driver.
  • Our driver then receives the IRP in the system context and not in the process context that originated the IRP.

When the filter keeps the IRP pending and returns from the Dispatch function, the thread that originated the IRP goes ahead and will perform other tasks. The original thread can still return to the process that started the whole operation in case the OVERLAPPED structures were used when calling the driver.

The IRP will now be executed in the process context that made a call to IoCallDriver passing the IRP that was pending as a parameter. In our example, the process that will do this is the System process. Using the code above, we obtain the System PID in place of the PID of the process that actually initiated the IRP.

To properly obtain the information we’re looking for, we’ll have to stroll a bit bigger. When every IRP is created, it enters the pending IRP list of the thread that created it. To get this thread from the IRP pointer, we use the field pIrp->Tail.Overlay.Thread. This field has a pointer to the ETHREAD structure, which refers to the thread that created this IRP. To get the process from the thread, we can use the IoThreadToProcess() API. See the excerpt below.

***     OnDispatchProc
**      A generic name that tells absolutely nothing...      
NTSTATUS OnDispatchProc(PDEVICE_OBJECT pDeviceObject,
                        PIRP           pIrp)
    PEPROCESS   pEProcess;
    PETHREAD    pEThread;
    //-f--> Here we get the thread pointer.
    //      The one responsible for creating this IRP.
    pEThread = pIrp->Tail.Overlay.Thread;
    //-f--> Now we get the process that owns
    //      this thread.
    pEProcess = IoThreadToProcess(pEThread);

Okay, see that wonder. Now you have in your hands EPROCESS structure, that according to Microsoft documentation it is an opaque structure used internally by the operating system.

“The EPROCESS structure is an opaque data structure used internally by the operating system.”

Now what can I do with this?

Although I’m sure you’ve thought of something I could do with this structure, I have a better suggestion, and why not saying much more appropriate? Even because there might have children reading this. Despite EPROCESS mean nothing, it can still bring us some useful information. We can get the handle of the process identified by this structure and thus, get additional information from it, such as its PID. See the example below.

//-f--> ZwQueryInformationProcess from
//      Windows NT/2000 Native API Reference
//      ISBN-10: 1578701996 
//      ISBN-13: 978-1578701995
    IN  HANDLE            ProcessHandle,
    IN  PROCESSINFOCLASS  ProcessInformationClass,
    OUT PVOID             ProcessInformation,
    IN  ULONG             ProcessInformationLength,
    OUT PULONG            ReturnLength OPTIONAL
***     MyGetProcessID
**      Get the process ID from its EPROCESS structure.
**      Please, use your imagination to create a better name
**      for this routine.
MyGetProcessID(IN  PEPROCESS    pEProcess,
               OUT PHANDLE      phProcessId)
    NTSTATUS                    nts = STATUS_SUCCESS;
    HANDLE                      hProcess = NULL;
    ULONG                       ulSize;
    //-f--> Zw routines are usually called
    //      from User-Mode, so to call them
    //      from Kernel, we'll need, at least,
    //      to be at PASSIVE_LEVEL.
    ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
        //-f--> Initialize the output parameter.
        *phProcessId = 0;
        //-f--> Now we get the handle of the process
        //      identified by this EPROCESS pointer.
        nts = ObOpenObjectByPointer(pEProcess,
        if (!NT_SUCCESS(nts))
        //-f--> To use non-documented API, it is enough to
        //      declare its prototype as it was made in the
        //      beginning os this example
        nts = ZwQueryInformationProcess(hProcess,
        if (NT_SUCCESS(nts))
        //-f--> Everybody is alive so far; now just set the output
        //      parameter with the correct information.
        *phProcessId = (HANDLE)ProcessInfo.UniqueProcessId;
        //-f--> Ops... Something got wrong!
        nts = GetExceptionCode();
    //-f--> Release the process handle that we got.
    //      In this way, your manager will not want to kill you
    //      when, although the processes have finished,
    //      there will still be EPROCESS structures spread in RAM.
    if (hProcess)
    //-f--> And all lived happily ever after.
    //      (including your manager)
    return nts;

You can get extensive information from the process handle. There are even ways to get the full process Path from its handle, but let’s leave that game to the next post.

See you… 🙂

Prog2Svc – A jobless service

Friday, February 2nd, 2007

Nowadays, although contradictory it may seem, creating a service does not require much work. A few clicks on a Visual Studio 2005 Wizard and just it, we will have an application capable of providing ATL COM interfaces and which is indeed a Windows service. In this post, I will talk about a little tool that I’m offering as a toast to those who have the patience to read this Blog.

At the first company I worked for as a programmer, about 11 years ago, I was responsible for maintaining Provectus’s software’s data collector communication network. At first, an MFC program was responsible for interacting with the driver that controls the SS140 board, which was the interface that the PC had to be part of this data collector network. Once the program had stabilized, we modified the software to make it a Windows service.

But what is a Service?

A service is an executable module, which is registered to be executed on the system even if nobody logs on to Windows. Today, many applications and Windows components are implemented as services. A service is not only a common executable registered to get started automatically. A service must provide an interface CallBack routines for the system, in order to respond to commands that start, stop and stay of execution. This interface is not a COM interface as some of you might have guessed. Take a look at the StartServiceCtrlDispatcher function to get an idea of the interface type I am referring to. This is just one of the necessary functions to build a service. Take a walk by reference for details on how to implement a service from the scratch. Services typically run on system account, but it may optionally use a given user account. More details about services, on the MSDN site.

Back to the ’90s, then we needed to make the Supervisor, one of our major programs at Provectus work as a service. Supervisor was a giant program built in Visual Basic 4.0. It was responsible for receiving commands from collector network, and regarding to them, execute Stored Procedures in SQL Server. Visual Basic 4.0 didn’t implement the AddressOf operator, which appeared only in version 5.0. This operator can be used to obtain the pointer to functions written in VB. This allowed such programs registering CallBack routines in the system. Anyway, with this operator, a few pounds of patience and a gun over your head, you could build a service in Visual Basic. We thought about the possibility of rewriting the entire Supervisor in C, so we could then turn it into a service. But luckily someone had an outbreak of sanity and said:

“Why do not you write an empty service that simply calls the Supervisor?”

Yeah, this really works perfectly! This need of having a program behaving as a service is more common than you think, especially when we are talking about business environments. It didn’t take long to some tools that made this possible appear, but nothing prevents me from having my own version.

Prog2Svc program is a service that does right that. Running this program without any parameters, you will receive the message below, which tells what the possible parameters are to be used.

Thus, to register the Windows calculator as a service, we use the following command line. Remember that this is an operation that requires administrative rights, so in the case of Windows Vista, this command should be run from a Command Prompt that was started as an administrator.

Prog2Svc -add Calculadora c:\Windows\System32\calc.exe

After the message of success has been shown, you can already see your new service through the Windows Service Manager. To access the service manager, type “services.msc” without the quotes in the Run… window.

You can start your service either using service manager or using the good old command “net start calculadora” in the Run… window.

When we start the service, we have no sign that it is really working. That’s because calc.exe was executed on another Desktop, but we can confirm its execution using Process Explorer.

The additional -interactive parameter can make your new service interaction with the DesktopNote: Windows Vista deserves special attention here. The -auto parameter sets the service automatic to start when the system is turned on. Finally, the –silent parameter makes the installation being silently done. That is, no success or error message will be displayed. Note: In this case, the success or failure can be verified by the Prog2Svc exit code. Another possibility is the use of environment variables in application path that will run. See this example more completely.

Prog2Svc -add -silent -auto BlocoDeNotas %SystemRoot%\System32\notepad.exe

Removing the service is very simple. See example below. Note that to remove services, we can also count on the silent mode.

Prog2Svc -remove Calculadora
Prog2Svc -remove -silent BlocoDeNotas

How does pseudo-service finishes?

When we ask to stop the service, Prog2Svc program receives a notification via CallBack routine. In that time, we could do a flying kick in the process’s chest, but this wouldn’t be good manners. A routine identifies the process created main window and sends a WM_CLOSE message to this window. After that, 30 seconds are expected for the process to have the opportunity to finish their tasks and deallocate system resources. If this time expires and the process is still running, well, it’s like my friend Thiago always says: “Only violence builds” and the process will be dropped out via TerminateProcess().

If you cannot still imagine how this tool might be useful, think you could create a service that runs the Command Prompt, and thus, you will have a shell running on system account. This would help you to test and find out what you could do with this account privilege.

Prog2Svc -add -interactive SysCmd %SystemRoot%\System32\cmd.exe

To install this program, just put a copy of this executable at the System32 directory. Technically, you could put it into any directory, but remember that the folder where it will be should be accessible by a system account. So, do not put it into directories like “My Documents” or any other personal folder.

As we know that the Internet is not the safest place to get an executable, make sure the program you are downloading has a valid signature.

Have fun!

  • Version:
  • Windows: 2000, XP, 2003, Vista

Prog2Svc.exe – x86 (58.7 KB)
Prog2Svc.exe – x64 (59.2 KB)
Prog2Svc.exe – IA64 (113 KB)