7/15/08

What is Multithreading?

What Is Multithreading?

Before beginning, it is necessary to define precisely what is meant by the term multithreading. Multithreading is a specialized form of multitasking. In general, there are two types of multitasking: process-based and thread-based. A process is, in essence, a program that is executing. Thus, process-based multitasking is the feature that allows your computer to run two or more programs concurrently. For example, it is process-based multitasking that allows you to run a word processor at the same time you are using a spreadsheet or browsing the Internet. In process-based multitasking, a program is the smallest unit of code that can be dispatched by the scheduler.

A thread is a dispatchable unit of executable code. The name comes from the concept of a “thread of execution.” In a thread-based multitasking environment, all processes have at least one thread, but they can have more. This means that a single program can perform two or more tasks concurrently. For instance, a text editor can be formatting text at the same time that it is printing, as long as these two actions are being performed by two separate threads. The differences between process-based and thread-based multitasking can be summarized like this: Process-based multitasking handles the concurrent execution of programs. Thread-based multitasking deals with the concurrent execution of pieces of the same program.

In the preceding discussions, it is important to clarify that true concurrent execution is possible only in a multiple-CPU system in which each process or thread has unrestricted access to a CPU. For single CPU systems, which constitute the vast majority of systems in use today, only the appearance of simultaneous execution is achieved. In a single CPU system, each process or thread receives a portion of the CPU’s time, with the amount of time determined by several factors, including the priority of the process or thread. Although truly concurrent execution does not exist on most computers, when writing multithreaded programs, you should assume that it does. This is because you can’t know the precise order in which separate threads will be executed, or if they will execute in the same sequence twice. Thus, its best to program as if true concurrent execution is the case.

Multithreading Changes the Architecture of a Program

Multithreading changes the fundamental architecture of a program. Unlike a single-threaded program that executes in a strictly linear fashion, a multithreaded program executes portions of itself concurrently. Thus, all multithreaded programs include an element of parallelism. Consequently, a major issue in multithreaded programs is managing the interaction of the threads.

As explained earlier, all processes have at least one thread of execution, which is called the main thread. The main thread is created when your program begins. In a multithreaded program, the main thread creates one or more child threads. Thus, each multithreaded process starts with one thread of execution and then creates one or more additional threads. In a properly designed program, each thread represents a single logical unit of activity.

The principal advantage of multithreading is that it enables you to write very efficient programs because it lets you utilize the idle time that is present in most programs. Most I/O devices, whether they are network ports, disk drives, or the keyboard, are much slower than the CPU. Often, a program will spend a majority of its execution time waiting to send or receive data. With the careful use of multithreading, your program can execute another task during this idle time. For example, while one part of your program is sending a file over the Internet, another part can be reading keyboard input, and still another can be buffering the next block of data to send.

Why Doesn’t C++ Contain Built-In Support for Multithreading?

C++ does not contain any built-in support for multithreaded applications. Instead, it relies entirely upon the operating system to provide this feature. Given that both Java and C# provide built-in support for multithreading, it is natural to ask why this isn’t also the case for C++. The answers are efficiency, control, and the range of applications to which C++ is applied. Let’s examine each.

By not building in support for multithreading, C++ does not attempt to define a “one size fits all” solution. Instead, C++ allows you to directly utilize the multithreading features provided by the operating system. This approach means that your programs can be multithreaded in the most efficient means supported by the execution environment. Because many multitasking environments offer rich support for multithreading, being able to access that support is crucial to the creation of high-performance, multithreaded programs.

Using operating system functions to support multithreading gives you access to the full range of control offered by the execution environment. Consider Windows. It defines a rich set of thread-related functions that enable finely grained control over the creation and management of a thread. For example, Windows has several ways to control access to a shared resource, including semaphores, mutexes, event objects, waitable timers, and critical sections. This level of flexibility cannot be easily designed into a language because the capabilities of operating systems differ. Thus, language-level support for multithreading usually means offering only a “lowest common denominator” of features. With C++, you gain access to all the features that the operating system provides. This is a major advantage when writing high-performance code.

C++ was designed for all types of programming, from embedded systems in which there is no operating system in the execution environment to highly distributed, GUI-based end-user applications and everything in between. Therefore, C++ cannot place significant constraints on its execution environment. Building in support for multithreading would have inherently limited C++ to only those environments that supported it and thus prevented C++ from being used to create software for nonthreaded environments.

In the final analysis, not building in support of multithreading is a major advantage for C++ because it enables programs to be written in the most efficient way possible for the target execution environment. Remember, C++ is all about power. In the case of multithreading, it is definitely a situation in which “less is more.”

What Operating System and Compiler?

Because C++ relies on the operating system to provide support for multithreaded programming, it is necessary to choose an operating system as the target for the multithreaded applications in this chapter. Because Windows is the most widely used operating system in the world, it is the operating system used in this chapter. However, much of the information can be generalized to any OS that supports multithreading.

Because Visual C++ is arguably the most widely used compiler for producing Windows programs, it is the compiler required by the examples in this chapter. The importance of this is made apparent in the following section. However, if you are using another compiler, the code can be easily adapted to accommodate it.

Windows offers a wide array of Application Programming Interface (API) functions that support multithreading. Many readers will be at least somewhat familiar with the multithreading functions offered by Windows, but for those who are not, an overview of those used in this chapter is presented here. Keep in mind that Windows provides many other multithreading-based functions that you might want to explore on your own.

To use Windows’ multithreading functions, you must include in your program.

Creating and Terminating a Thread

To create a thread, the Windows API supplies the CreateThread( ) function. Its prototype is shown here:

HANDLE CreateThread(LPSECURITY_ATTRIBUTES secAttr,
SIZE_T stackSize,
LPTHREAD_START_ROUTINE threadFunc,
LPVOID param,
DWORD flags,
LPDWORD threadID);

Here, secAttr is a pointer to a set of security attributes pertaining to the thread. However, if secAttr is NULL, then the default security descriptor is used.

Each thread has its own stack. You can specify the size of the new thread’s stack in bytes using the stackSize parameter. If this integer value is zero, then the thread will be given a stack that is the same size as the creating thread. In this case, the stack will be expanded, if necessary. (Specifying zero is the common approach taken to thread stack size.)

Each thread of execution begins with a call to a function, called the thread function, within the creating process. Execution of the thread continues until the thread function returns. The address of this function (that is, the entry point to the thread) is specified in threadFunc. All thread functions must have this prototype:

DWORD WINAPI threadfunc(LPVOID param);

Any argument that you need to pass to the new thread is specified in CreateThread( )’s param. This 32-bit value is received by the thread function in its parameter. This parameter may be used for any purpose. The function returns its exit status.

The flags parameter determines the execution state of the thread. If it is zero, the thread begins execution immediately. If it is CREATE_SUSPEND, the thread is created in a suspended state, awaiting execution. (It may be started using a call to ResumeThread( ), discussed later.)

The identifier associated with a thread is returned in the long integer pointed to by threadID.

The function returns a handle to the thread if successful or NULL if a failure occurs. The thread handle can be explicitly destroyed by calling CloseHandle( ). Otherwise, it will be destroyed automatically when the parent process ends.

As just explained, a thread of execution terminates when its entry function returns. The process may also terminate the thread manually, using either TerminateThread( ) or ExitThread( ), whose prototypes are shown here:

BOOL TerminateThread(HANDLE thread, DWORD status);
VOID ExitThread(DWORD status);

For TerminateThread( ), thread is the handle of the thread to be terminated. ExitThread( ) can only be used to terminate the thread that calls ExitThread( ). For both functions, status is the termination status. TerminateThread( ) returns nonzero if successful and zero otherwise.

Calling ExitThread( ) is functionally equivalent to allowing a thread function to return normally. This means that the stack is properly reset. When a thread is terminated using TerminateThread( ), it is stopped immediately and does not perform any special cleanup activities. Also, TerminateThread( ) may stop a thread during an important operation. For these reasons, it is usually best (and easiest) to let a thread terminate normally when its entry function returns.

The Visual C++ Alternatives to CreateThread( ) and ExitThread( )

Although CreateThread( ) and ExitThread( ) are the Windows API functions used to create and terminate a thread, we won’t be using them in this chapter! The reason is that when these functions are used with Visual C++ (and possibly other Windows-compatible compilers), they can result in memory leaks, the loss of a small amount of memory. For Visual C++, if a multithreaded program utilizes C/C++ standard library functions and uses CreateThread( ) and ExitThread( ), then small amounts of memory are lost. (If your program does not use the C/C++ standard library, then no such losses will occur.) To eliminate this problem, you must use functions defined by the C/C++ runtime library to start and stop threads rather than those specified by the Win32 API. These functions parallel CreateThread( ) and ExitThread( ), but do not generate a memory leak.

Priority Classes

The Visual C++ alternatives to CreateThread( ) and ExitThread( ) are _beginthreadex( ) and _endthreadex( ). Both require the header file . Here is the prototype for _beginthreadex( ):

uintptr_t _beginthreadex(void *secAttr, unsigned stackSize,
unsigned (__stdcall *threadFunc)(void *),
void *param, unsigned flags,
unsigned *threadID);

As you can see, the parameters to _beginthreadex( ) parallel those to CreateThread( ). Furthermore, they have the same meaning as those specified by CreateThread( ). secAttr is a pointer to a set of security attributes pertaining to the thread. However, if secAttr is NULL, then the default security descriptor is used. The size of the new thread’s stack, in bytes, is passed in stackSize parameter. If this value is zero, then the thread will be given a stack that is the same size as the main thread of the process that creates it.

The address of the thread function (that is, the entry point to the thread) is specified in threadFunc. For _beginthreadex( ), a thread function must have this prototype:

unsigned __stdcall threadfunc(void * param);

This prototype is functionally equivalent to the one for CreateThread( ), but it uses different type names. Any argument that you need to pass to the new thread is specified in the param parameter.

The flags parameter determines the execution state of the thread. If it is zero, the thread begins execution immediately. If it is CREATE_SUSPEND, the thread is created in a suspended state, awaiting execution. (It may be started using a call to ResumeThread( ).) The identifier associated with a thread is returned in the double word pointed to by threadID.

The function returns a handle to the thread if successful or zero if a failure occurs. The type uintptr_t specifies a Visual C++ type capable of holding a pointer or handle.

The prototype for _endthreadex( ) is shown here:

void _endthreadex(unsigned status);

It functions just like ExitThread( ) by stopping the thread and returning the exit code specified in status.

Because the most widely used compiler for Windows is Visual C++, the examples in this chapter will use _beginthreadex( ) and _endthreadex( ) rather their equivalent API functions. If you are using a compiler other than Visual C++, simply substitute CreateThread( ) and EndThread( ).

When using _beginthreadex( ) and _endthreadex( ), you must remember to link in the multithreaded library. This will vary from compiler to compiler. Here are some examples. When using the Visual C++ command-line compiler, include the –MT option. To use the multithreaded library from the Visual C++ 6 IDE, first activate the Project | Settings property sheet. Then, select the C/C++ tab. Next, select Code Generation from the Category list box and then choose Multithreaded in the Use Runtime Library list box. For Visual C++ 7 .NET IDE, select Project | Properties. Next, select the C/C++ entry and highlight Code Generation. Finally, choose Multi-threaded as the runtime library.

Suspending and Resuming a Thread

A thread of execution can be suspended by calling SuspendThread( ). It can be resumed by calling ResumeThread( ). The prototypes for these functions are shown here:

DWORD SuspendThread(HANDLE hThread);
DWORD ResumeThread(HANDLE hThread);

For both functions, the handle to the thread is passed in hThread.

Each thread of execution has associated with it a suspend count. If this count is zero, then the thread is not suspended. If it is nonzero, the thread is in a suspended state. Each call to SuspendThread( ) increments the suspend count. Each call to ResumeThread( ) decrements the suspend count. A suspended thread will resume only when its suspend count has reached zero. Therefore, to resume a suspended thread implies that there must be the same number of calls to ResumeThread( ) as there have been calls to SuspendThread( ).

Both functions return the thread’s previous suspend count or –1 if an error occurs.

Changing the Priority of a Thread

In Windows, each thread has associated with it a priority setting. A thread’s priority determines how much CPU time a thread receives. Low priority threads receive little time. High priority threads receive a lot. Of course, how much CPU time a thread receives has a profound impact on its execution characteristics and its interaction with other threads currently executing in the system.

In Windows, a thread’s priority setting is the combination of two values: the overall priority class of the process and the priority setting of the individual thread relative to that priority class. That is, a thread’s actual priority is determined by combining the process’s priority class with the thread’s individual priority level. Each is examined next.

By default, a process is given a priority class of normal, and most programs remain in the normal priority class throughout their execution lifetime. Although neither of the examples in this chapter changes the priority class, a brief overview of the thread priority classes is given here in the interest of completeness.

Windows defines six priority classes, which correspond to the value shown here, in order of highest to lowest priority:

REALTIME_PRIORITY_CLASS

HIGH_PRIORITY_CLASS

ABOVE_NORMAL_PRIORITY_CLASS

NORMAL_PRIORITY_CLASS

BELOW_NORMAL_PRIORITY_CLASS

IDLE_PRIORITY_CLASS

Programs are given the NORMAL_PRIORITY_CLASS by default. Usually, you won’t need to alter the priority class of your program. In fact, changing a process’ priority class can have negative consequences on the overall performance of the computer system. For example, if you increase a program’s priority class to REALTIME_PRIORITY_CLASS, it will dominate the CPU. For some specialized applications, you may need to increase an application’s priority class, but usually you won’t. As mentioned, neither of the applications in this chapter changes the priority class.

In the event that you do want to change the priority class of a program, you can do by calling SetPriorityClass( ). You can obtain the current priority class by calling GetPriorityClass( ). The prototypes for these functions are shown here:

DWORD GetPriorityClass(HANDLE hApp);
BOOL SetPriorityClass(HANDLE hApp, DWORD priority);

Here, hApp is the handle of the process. GetPriorityClass( ) returns the priority class of the application or zero on failure. For SetPriorityClass( ), priority specifies the process’s new priority class.

Thread Priorities

For any given priority class, each individual thread’s priority determines how much CPU time it receives within its process. When a thread is first created, it is given normal priority, but you can change a thread’s priority—even while it is executing.

You can obtain a thread’s priority setting by calling GetThreadPriority( ). You can increase or decrease a thread’s priority using SetThreadPriority( ). The prototypes for these functions are shown here:

BOOL SetThreadPriority(HANDLE hThread, int priority);
int GetThreadPriority(HANDLE hThread);

For both functions, hThread is the handle of the thread. For SetThreadPriority( ), priority is the new priority setting. If an error occurs, SetThreadPriority( ) returns zero. It returns nonzero otherwise. For GetThreadPriority( ), the current priority setting is returned. The priority settings are shown here, in order of highest to lowest:

Thread Priority

Value

THREAD_PRIORITY_TIME_CRITICAL

15

THREAD_PRIORITY_HIGHEST

2

THREAD_PRIORITY_ABOVE_NORMAL

1

THREAD_PRIORITY_NORMAL

0

THREAD_PRIORITY_BELOW_NORMAL

-1

THREAD_PRIORITY LOWEST

-2

THREAD_PRIORITY_IDLE

-15

These values are increments or decrements that are applied relative to the priority class of the process. Through the combination of a process’ priority class and thread priority, Windows supports 31 different priority settings for application programs.

GetThreadPriority( ) returns THREAD_PRIORITY_ERROR_RETURN if an error occurs.

For the most part, if a thread has the NORMAL_PRIORITY class, you can freely experiment with changing its priority setting without fear of catastrophically affecting overall system performance. As you will see, the thread control panel developed in the next section allows you to alter the priority setting of a thread within a process (but does not change its priority class).

Obtaining the Handle of the Main Thread

It is possible to control the execution of the main thread. To do so, you will need to acquire its handle. The easiest way to do this is to call GetCurrentThread( ), whose prototype is shown here:

HANDLE GetCurrentThread(void);

This function returns a pseudohandle to the current thread. It is called a pseudohandle because it is a predefined value that always refers to the current thread rather than specifically to the calling thread. It can, however, be used any place that a normal thread handle can.

Synchronization

When using multiple threads or processes, it is sometimes necessary to coordinate the activities of two or more. This process is called synchronization. The most common use of synchronization occurs when two or more threads need access to a shared resource that must be used by only one thread at a time. For example, when one thread is writing to a file, a second thread must be prevented from doing so at the same time. Another reason for synchronization is when one thread is waiting for an event that is caused by another thread. In this case, there must be some means by which the first thread is held in a suspended state until the event has occurred. Then the waiting thread must resume execution.

There are two general states that a task may be in. First, it may be executing (or ready to execute as soon as it obtains its time slice). Second, a task may be blocked, awaiting some resource or event, in which case its execution is suspended until the needed resource is available or the event occurs.

If you are not familiar with the synchronization problem or its most common solution, the semaphore, the next section discusses it.

Understanding the Synchronization Problem

Windows must provide special services that allow access to a shared resource to be synchronized, because without help from the operating system, there is no way for one process or thread to know that it has sole access to a resource. To understand this, imagine that you are writing programs for a multitasking operating system that does not provide any synchronization support. Further imagine that you have two concurrently executing threads, A and B, both of which, from time to time, require access to some resource R (such as a disk file) that must be accessed by only one thread at a time. As a means of preventing one thread from accessing R while the other is using it, you try the following solution. First, you establish a variable called flag that is initialized to zero and can be accessed by both threads. Then, before using each piece of code that accesses R, you wait for flag to be cleared, then set flag, access R, and finally, clear flag. That is, before either thread accesses R, it executes this piece of code:

while(flag) ; // wait for flag to be cleared
flag = 1; // set flag
// ... access resource R ...
flag = 0; // clear the flag

The idea behind this code is that neither thread will access R if flag is set. Conceptually, this approach is in the spirit of the correct solution. However, in actual fact it leaves much to be desired for one simple reason: it won’t always work! Let’s see why.

Using the code just given, it is possible for both processes to access R at the same time. The while loop is, in essence, performing repeated load and compare instructions on flag or, in other words, it is testing flag’s value. When flag is cleared, the next line of code sets flag’s value. The trouble is that it is possible for these two operations to be performed in two different time slices. Between the two time slices, the value of flag might have been accessed by the other thread, thus allowing R to be used by both threads at the same time. To understand this, imagine that thread A enters the while loop and finds that flag is zero, which is the green light to access R. However, before it can set flag to 1, its time slice expires and thread B resumes execution. If B executes its while, it too will find that flag is not set and assume that it is safe to access R. However, when A resumes it will also begin accessing R. The crucial aspect of the problem is that the testing and setting of flag do not comprise one uninterruptible operation. Rather, as just illustrated, they can be separated by a time slice. No matter how you try, there is no way, using only application-level code, that you can absolutely guarantee that one and only one thread will access R at one time.

The solution to the synchronization problem is as elegant as it is simple. The operating system (in this case Windows) provides a routine that in one uninterrupted operation, tests and, if possible, sets a flag. In the language of operating systems engineers, this is called a test and set operation. For historical reasons, the flags used to control access to a shared resource and provide synchronization between threads (and processes) are called semaphores. The semaphore is at the core of the Windows synchronization system.

Windows supports several types of synchronization objects. The first type is the classic semaphore. When using a semaphore, a resource can be completely synchronized, in which case one and only one thread or process can access it at any one time, or the semaphore can allow no more than a small number of processes or threads access at any one time. Semaphores are implemented using a counter that is decremented when a task is granted the semaphore and incremented when the task releases it.

The second synchronization object is the mutex semaphore, or just mutex, for short. A mutex synchronizes a resource such that one and only one thread or process can access it at any one time. In essence, a mutex is a special case version of a standard semaphore.

The third synchronization object is the event object. It can be used to block access to a resource until some other thread or process signals that it can be used. (That is, an event object signals that a specified event has occurred.)

The fourth synchronization object is the waitable timer. A waitable timer blocks a thread’s execution until a specific time. You can also create timer queues, which are lists of timers.

You can prevent a section of code from being used by more than one thread at a time by making it into a critical section using a critical section object. Once a critical section is entered by one thread, no other thread may use it until the first thread has left the critical section.

The only synchronization object used in this chapter is the mutex, which is described in the following section. However, all synchronization objects defined by Windows are available to the C++ programmer. As explained, this is one of the major advantages that results from C++’s reliance on the operating system to handle multithreading: all multithreading features are at your command.

Using a Mutex to Synchronize Threads

As explained, a mutex is a special-case semaphore that allows only one thread to access a resource at any given time. Before you can use a mutex, you must create one using CreateMutex( ), whose prototype is shown here:

HANDLE CreateMutex(LPSECURITY_ATTRIBUTES secAttr,
BOOL acquire,
LPCSTR name);

Here, secAttr is a pointer to the security attributes. If secAttr is NULL, the default security descriptor is used.

If the creating thread desires control of the mutex, then acquire must be true. Otherwise, pass false.

The name parameter points to a string that becomes the name of the mutex object. Mutexes are global objects, which may be used by other processes. As such, when two processes each open a mutex using the same name, both are referring to the same mutex. In this way, two processes can be synchronized. The name may also be NULL, in which case the semaphore is localized to one process.

The CreateMutex( ) function returns a handle to the semaphore if successful or NULL on failure. A mutex handle is automatically closed when the main process ends. You can explicitly close a mutex handle when it is no longer needed by calling CloseHandle( ).

Once you have created a semaphore, you use it by calling two related functions: WaitForSingleObject( ) and ReleaseMutex( ). The prototypes for these functions are shown here:

DWORD WaitForSingleObject(HANDLE hObject, DWORD howLong);
BOOL ReleaseMutex(HANDLE hMutex);

WaitForSingleObject( ) waits on a synchronization object. It does not return until the object becomes available or a time-out occurs. For use with mutexes, hObject will be the handle of a mutex. The howLong parameter specifies, in milliseconds, how long the calling routine will wait. Once that time has elapsed, a time-out error will be returned. To wait indefinitely, use the value INFINITE. The function returns WAIT_OBJECT_0 when successful (that is, when access is granted). It returns WAIT_TIMEOUT when time-out is reached.

ReleaseMutex( ) releases the mutex and allows another thread to acquire it. Here, hMutex is the handle to the mutex. The function returns nonzero if successful and zero on failure.

To use a mutex to control access to a shared resource, wrap the code that accesses that resource between a call to WaitForSingleObject( ) and ReleaseMutex( ), as shown in this skeleton. (Of course, the time-out period will differ from application to application.)

if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT) {
// handle time-out error
}
// access the resource

ReleaseMutex(hMutex);

Generally, you will want to choose a time-out period that will be more than enough to accommodate the actions of your program. If you get repeated time-out errors when developing a multithreaded application, it usually means that you have created a deadlock condition. Deadlock occurs when one thread is waiting on a mutex that another thread never releases.

Creating a Thread Control Panel

When developing multithreaded programs, it is often useful to experiment with various priority settings. It is also useful to be able to dynamically suspend and resume a thread, or even terminate a thread. As you will see, it is quite easy, using the thread functions just described, to create a thread control panel that allows you to accomplish these things. Further, you can use the control panel while your multithreaded program is running. The dynamic nature of the thread control panel allows you to easily change the execution profile of a thread and observe the results.

The thread control panel developed in this section is capable of controlling one thread. However, you can create as many panels as needed, with each controlling a different thread. For the sake of simplicity, the control panel is implemented as a modeless dialog box that is owned by the desktop, not the application whose thread it controls.

The thread control panel is capable of performing the following actions:

  • Setting a thread’s priority
  • Suspending a thread
  • Resuming a thread
  • Terminating a thread

As stated, the control panel is as a modeless dialog box. As you know, when a modeless dialog box is activated, the rest of the application is still active. Thus, the control panel runs independently of the application for which it is being used.

The code for the thread control panel is shown here. This file is called tcp.cpp.

// A thread control panel.
#include
#include
#include "panel.h"
using namespace std;
const int NUMPRIORITIES = 5;
const int OFFSET = 2;
// Array of strings for priority list box.
char priorities[NUMPRIORITIES][80] = {
"Lowest",
"Below Normal",
"Normal",
"Above Normal",
"Highest"
};
// A Thread Control Panel Class.
class ThrdCtrlPanel {
// Information about the thread under control.
struct ThreadInfo {
HANDLE hThread; // handle of thread
int priority; // current priority
bool suspended; // true if suspended
ThreadInfo(HANDLE ht, int p, bool s) {
hThread = ht;
priority = p;
suspended = s;
}
};
// This map holds a ThreadInfo for each
// active thread control panel.
static map dialogmap;
public:
// Construct a control panel.
ThrdCtrlPanel(HINSTANCE hInst, HANDLE hThrd);
// The control panel's callback function.
static LRESULT CALLBACK ThreadPanel(HWND hwnd, UINT message,
WPARAM wParam, LPARAM lParam);
};
// Define static member dialogmap.
map
ThrdCtrlPanel::dialogmap;
// Create a thread control panel. ThrdCtrlPanel::ThrdCtrlPanel(HINSTANCE hInst,
HANDLE hThrd)
{
ThreadInfo ti(hThrd,
GetThreadPriority(hThrd)+OFFSET,
false);
// Owner window is desktop.
HWND hDialog = CreateDialog(hInst, "ThreadPanelDB",
NULL,
(DLGPROC) ThreadPanel);
// Put info about this dialog box in the map.
dialogmap.insert(pair(hDialog, ti));
// Set the control panel's title.
char str[80] = "Control Panel for Thread ";
char str2[4];
_itoa(dialogmap.size(), str2, 10);
strcat(str, str2);
SetWindowText(hDialog, str);
// Offset each dialog box instance.
MoveWindow(hDialog, 30*dialogmap.size(),
30*dialogmap.size(),
300, 250, 1);
// Update priority setting in the list box.
SendDlgItemMessage(hDialog, IDD_LB, LB_SETCURSEL,
(WPARAM) ti.priority, 0);
// Increase priority to ensure control. You can
// change or remove this statement based on your
// execution environment.
SetThreadPriority(GetCurrentThread(),
THREAD_PRIORITY_ABOVE_NORMAL);
}
// Thread control panel dialog box callback function.
LRESULT CALLBACK ThrdCtrlPanel::ThreadPanel(HWND hwnd,
UINT message,
WPARAM wParam,
LPARAM lParam)
{
int i;
HWND hpbRes, hpbSus, hpbTerm;
switch(message) {
case WM_INITDIALOG:
// Initialize priority list box.
for(i=0; i SendDlgItemMessage(hwnd, IDD_LB,
LB_ADDSTRING, 0, (LPARAM) priorities[i]);
}
// Set suspend and resume buttons for thread.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, true); // enable Suspend
EnableWindow(hpbRes, false); // disable Resume
return 1;
case WM_COMMAND:
map::iterator p = dialogmap.find(hwnd);
switch(LOWORD(wParam)) {
case IDD_TERMINATE:
TerminateThread(p->second.hThread, 0);
// Disable Terminate button.
hpbTerm = GetDlgItem(hwnd, IDD_TERMINATE); }
EnableWindow(hpbTerm, false); // disable
// Disable Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, false); // disable Suspend
EnableWindow(hpbRes, false); // disable Resume
return 1;
case IDD_SUSPEND:
SuspendThread(p->second.hThread);
// Set state of the Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, false); // disable Suspend
EnableWindow(hpbRes, true); // enable Resume
p->second.suspended = true;
return 1;
case IDD_RESUME:
ResumeThread(p->second.hThread);
// Set state of the Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME); /'
EnableWindow(hpbSus, true); // enable Suspend /'
EnableWindow(hpbRes, false); // disable Resume
p->second.suspended = false;
return 1;
case IDD_LB:
// If a list box entry was clicked,
// then change the priority.
if(HIWORD(wParam)==LBN_DBLCLK) {
p->second.priority = SendDlgItemMessage(hwnd,
IDD_LB, LB_GETCURSEL,/'
0, 0);
SetThreadPriority(p->second.hThread,
p->second.priority-OFFSET);
}
return 1;
case IDCANCEL:
// If thread is suspended when panel is closed,
// then resume thread to prevent deadlock.
if(p->second.suspended) {
ResumeThread(p->second.hThread);
p->second.suspended = false;
}
// Remove this thread from the list.
dialogmap.erase(hwnd);
// Close the panel.
DestroyWindow(hwnd);?
return 1;
}
}
return 0;
}

The control panel requires the following resource file, called tcp.rc:

#include
#include "panel.h"
ThreadPanelDB DIALOGEX 20, 20, 140, 110
CAPTION "Thread Control Panel"
STYLE WS_BORDER | WS_VISIBLE | WS_POPUP | WS_CAPTION | WS_SYSMENU
{
DEFPUSHBUTTON "Done", IDCANCEL, 55, 80, 33, 14
PUSHBUTTON "Terminate", IDD_TERMINATE, 10, 20, 42, 12
PUSHBUTTON "Suspend", IDD_SUSPEND, 10, 35, 42, 12
PUSHBUTTON "Resume", IDD_RESUME, 10, 50, 42, 12
LISTBOX IDD_LB, 65, 20, 63, 42, LBS_NOTIFY | WS_VISIBLE |
WS_BORDER | WS_VSCROLL | WS_TABSTOP
CTEXT "Thread Priority", IDD_TEXT1, 65, 8, 64, 10
CTEXT "Change State", IDD_TEXT2, 0, 8, 64, 10
}

The control panel uses the following header file called panel.h:

#define IDD_LB 200
#define IDD_TERMINATE 202
#define IDD_SUSPEND 204
#define IDD_RESUME 206
#define IDD_TEXT1 208
#define IDD_TEXT2 209

To use the thread control panel, follow these steps:

1. Include tcp.cpp in your program.

2. Include tcp.rc in your program’s resource file.

3. Create the thread or threads that you want to control.

4. Instantiate a ThrdCtrlPanel object for each thread.

Each ThrdCtrlPanel object links a thread with a dialog box that controls it. For large projects in which multiple files need access to ThrdCtrlPanel, you will need to use a header file called tcp.h that contains the declaration for ThrdCtrlPanel. Here is tcp.h:

// A header file for the ThrdCtrlPanel class.
class ThrdCtrlPanel {
public:
// Construct a control panel.
ThrdCtrlPanel(HINSTANCE hInst, HANDLE hThrd);
// The control panel's callback function.
static LRESULT CALLBACK ThreadPanel(HWND hwnd, UINT message,
WPARAM wParam, LPARAM lParam);
};

Let’s take a closer look at the thread control panel. It begins by defining the following global definitions:

const int NUMPRIORITIES = 5;
const int OFFSET = 2;
// Array of strings for priority list box.
char priorities[NUMPRIORITIES][80] = {
"Lowest",
"Below Normal",
"Normal",
"Above Normal",
"Highest"
};

The priorities array holds strings that correspond to a thread’s priority setting. It initializes the list box inside the control panel that displays the current thread priority. The number of priorities is specified by NUMPRIORITIES, which is 5 for Windows. Thus, NUMPRIORITIES defines the number of different priorities that a thread may have. (If you adapt the code for use with another operating system, a different value might be required.) Using the control panel, you can set a thread to one of the following priorities:

THREAD_PRIORITY_HIGHEST

THREAD_PRIORITY_ABOVE_NORMAL

THREAD_PRIORITY_NORMAL

THREAD_PRIORITY_BELOW_NORMAL

THREAD_PRIORITY_LOWEST

The other two thread priority settings:

THREAD_PRIORITY_TIME_CRITICAL

THREAD_PRIORITY_IDLE

are not supported because, relative to the control panel, they are of little practical value. For example, if you want to create a time-critical application, you are better off making its priority class time-critical.

OFFSET defines an offset that will be used to translate between list box indexes and thread priorities. You should recall that normal priority has the value zero. In this example, the highest priority is THREAD_PRIORITY_HIGHEST, which is 2. The lowest priority is THREAD_PRIORITY_LOWEST, which is –2. Because list box indexes begin at zero, the offset is used to convert between indexes and priority settings.

Next, the ThrdCtrlPanel class is declared. It begins as shown here:

// A Thread Control Panel Class.
class ThrdCtrlPanel {
// Information about the thread under control.
struct ThreadInfo {
HANDLE hThread; // handle of thread
int priority; // current priority
bool suspended; // true if suspended
ThreadInfo(HANDLE ht, int p, bool s) {
hThread = ht;
priority = p;
suspended = s;
}
};
// This map holds a ThreadInfo for each
// active thread control panel.
static map dialogmap;

Information about the thread under control is contained within a structure of type ThreadInfo. The handle of the thread is stored in hThread. Its priority is stored in priority. If the thread is suspended, then suspended will be true. Otherwise, suspended will be false.

The static member dialogmap is an STL map that links the thread information with the handle of the dialog box used to control that thread. Because there can be more than one thread control panel active at any given time, there must be some way to determine which thread is associated with which panel. It is dialogmap that provides this linkage.

The ThreadCtrlPanel Constructor

The ThrdCtrlPanel constructor is shown here. The constructor is passed the instance handle of the application and the handle of the thread being controlled. The instance handle is needed to create the control panel dialog box.

// Create a thread control panel. ThrdCtrlPanel::ThrdCtrlPanel(HINSTANCE hInst,
HANDLE hThrd)
{
ThreadInfo ti(hThrd,
GetThreadPriority(hThrd)+OFFSET,
false);
// Owner window is desktop.
HWND hDialog = CreateDialog(hInst, "ThreadPanelDB",
NULL,
(DLGPROC) ThreadPanel);
// Put info about this dialog box in the map.
dialogmap.insert(pair(hDialog, ti));
// Set the control panel's title.
char str[80] = "Control Panel for Thread ";
char str2[4];
_itoa(dialogmap.size(), str2, 10);
strcat(str, str2);
SetWindowText(hDialog, str);
// Offset each dialog box instance.
MoveWindow(hDialog, 30*dialogmap.size(),
30*dialogmap.size(),
300, 250, 1);
// Update priority setting in the list box.
SendDlgItemMessage(hDialog, IDD_LB, LB_SETCURSEL,
(WPARAM) ti.priority, 0);
// Increase priority to ensure control. You can
// change or remove this statement based on your
// execution environment.
SetThreadPriority(GetCurrentThread(),
THREAD_PRIORITY_ABOVE_NORMAL);
}

The constructor begins by creating a ThreadInfo instance called ti that contains the initial settings for the thread. Notice that the priority is obtained by calling GetThreadPriority( ) for the thread being controlled. Next, the control panel dialog box is created by calling CreateDialog( ). CreateDialog( ) is a Windows API function that creates a modeless dialog box, which makes it independent of the application that creates it. The handle of this dialog box is returned and stored in hDialog. Next, hDialog and the thread information contained in ti are stored in dialogmap. Thus, the thread is linked with the dialog box that controls it.

Next, the title of the dialog box is set to reflect the number of the thread. The number of the thread is obtained based on the number of entries in dialogmap. An alternative that you might want to try implementing is to explicitly pass a name for each thread to the ThrdCtrlPanel constructor. For the purposes of this chapter, simply numbering each thread is sufficient.

Next, the control panel’s position on the screen is offset a bit by calling MoveWindow( ), another Windows API function. This enables multiple panels to be displayed without each one fully covering the one before it. The thread’s priority setting is then displayed in the priority list box by calling the Windows API function SendDlgItemMessage( ).

Finally, the current thread has its priority increased to above normal. This ensures that the application receives enough CPU time to be responsive to user input no matter what is the priority level of the thread under control. This step may not be needed in all cases. You can experiment to find out.

The ThreadPanel( ) Function

ThreadPanel( ) is the Windows callback function that responds to user interaction with the thread control panel. Like all dialog box callback functions, it receives a message each time the user changes the state of a control. It is passed the handle of the dialog box in which the action occurred, the message, and any additional information required by the message. Its general mode of operation is the same as that for any other callback function used by a dialog box. The following discussion describes what happens for each message.

When the thread control panel dialog box is first created, it receives a WM_INITDIALOG message, which is handled by this case sequence:

caseWM_INITDIALOG:
// Initialize priority list box.
for(i=0; iSendDlgItemMessage(hwnd, IDD_LB,
LB_ADDSTRING, 0, (LPARAM) priorities[i]);
}
// Set Suspend and Resume buttons for thread.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, true); // enable Suspend
EnableWindow(hpbRes, false); // disable Resume
return 1;

This initializes the priority list box and sets the Suspend and Resume buttons to their initial states, which are Suspend enabled and Resume disabled.

Each user interaction generates a WM_COMMAND message. Each time this message is received, an iterator to this dialog box’s entry in dialogmap is retrieved, as shown here:

case WM_COMMAND:
map::iterator p = dialogmap.find(hwnd);

The information pointed to by p will be used to properly process each action. Because p is an iterator for a map, it points to an object of type pair, which is a structure defined by the STL. This structure contains two fields: first and second. These fields correspond to the information that comprises the key and the value, respectively. In this case, the handle is the key and the thread information is the value.

A code indicating precisely what action has occurred is contained in the low-order word of wParam, which is used to control a switch statement that handles the remaining messages. Each is described next.

When the user presses the Terminate button, the thread under control is stopped. This is handled by this case sequence:

case IDD_TERMINATE:
TerminateThread(p->second.hThread, 0);
// Disable Terminate button.
hpbTerm = GetDlgItem(hwnd, IDD_TERMINATE);
EnableWindow(hpbTerm, false); // disable
// Disable Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, false); // disable Suspend
EnableWindow(hpbRes, false); // disable Resume
return 1;

The thread is stopped with a call to TerminateThread( ). Notice how the handle for the thread is obtained. As explained, because p is an iterator for a map, it points to an object of type pair that contains the key in its first field and the value in its second field. This is why the thread handle is obtained by the expression p->second.hThread. After the thread is stopped, the Terminate button is disabled.

Once a thread has been terminated, it cannot be resumed. Notice that the control panel uses TerminateThread( ) to halt execution of a thread. As mentioned earlier, this function must be used with care. If you use the control panel to experiment with threads of your own, you will want to make sure that no harmful side effects are possible.

When the user presses the Suspend button, the thread is suspended. This is accomplished by the following sequence:

case IDD_SUSPEND:
SuspendThread(p->second.hThread);
// Set state of the Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, false); // disable Suspend
EnableWindow(hpbRes, true); // enable Resume
p->second.suspended = true;
return 1;

The thread is suspended by a call to SuspendThread( ). Next, the state of the Suspend and Resume buttons are updated such that Resume is enabled and Suspend is disabled. This prevents the user from attempting to suspend a thread twice.

A suspended thread is resumed when the Resume button is pressed. It is handled by this code:

case IDD_RESUME:
ResumeThread(p->second.hThread);
// Set state of the Suspend and Resume buttons.
hpbSus = GetDlgItem(hwnd, IDD_SUSPEND);
hpbRes = GetDlgItem(hwnd, IDD_RESUME);
EnableWindow(hpbSus, true); // enable Suspend
EnableWindow(hpbRes, false); // disable Resume
p->second.suspended = false;
return 1;

The thread is resumed by a call to ResumeThread( ), and the Suspend and Resume buttons are set appropriately.

To change a thread’s priority, the user double-clicks an entry in the Priority list box. This event is handled as shown next:

case IDD_LB:
// If a list box entry was double-clicked,
// then change the priority.
if(HIWORD(wParam)==LBN_DBLCLK) {
p->second.priority = SendDlgItemMessage(hwnd,
IDD_LB, LB_GETCURSEL,
0, 0);
SetThreadPriority(p->second.hThread,
p->second.priority-OFFSET);
}
return 1;

List boxes generate various types of notification messages that describe the precise type of event that occurred. Notification messages are contained in the high-order word of wParam. One of these messages is LBN_DBLCLK, which means that the user double-clicked an entry in the box. When this notification is received, the index of the entry is retrieved by calling the Windows API function SendDlgItemMessage( ), requesting the current selection. This value is then used to set the thread’s priority. Notice that OFFSET is subtracted to normalize the value of the index.

Finally, when the user closes the thread control panel dialog box, the IDCANCEL message is sent. It is handled by the following sequence:

case IDCANCEL:
// If thread is suspended when panel is closed,
// then resume thread to prevent deadlock.
if(p->second.suspended) {
ResumeThread(p->second.hThread);
p->second.suspended = false;
}
// Remove this thread from the list.
dialogmap.erase(hwnd);
// Close the panel.
DestroyWindow(hwnd);
return 1;

If the thread was suspended, it is restarted. This is necessary to avoid accidentally deadlocking the thread. Next, this dialog box’s entry in dialogmap is removed. Finally, the dialog box is removed by calling the Windows API function DestroyWindow( ).

Here is a program that includes the thread control panel and demonstrates its use. Sample output is shown in Figure 3-2. The program creates a main window and defines two child threads. When started, these threads simply count from 0 to 50,000, displaying the count in the main window. These threads can be controlled by activating a thread control panel.

To use the program, first begin execution of the threads by selecting Start Threads from the Threads menu (or by pressing F2) and then activate the thread control panels by selecting Control Panels from the Threads menu (or by pressing F3). Once the control panels are active, you can experiment with different priority settings and so on.

// Demonstrate the thread control panel.
#include
#include
#include "thrdapp.h"
#include "tcp.cpp"
const int MAX = 500000;
LRESULT CALLBACK WindowFunc(HWND, UINT, WPARAM, LPARAM);
unsigned __stdcall MyThread1(void * param);
unsigned __stdcall MyThread2(void * param);
char str[255]; // holds output strings
unsigned tid1, tid2; // thread IDs
HANDLE hThread1, hThread2; // thread handles
HINSTANCE hInst; // instance handle
int WINAPI WinMain(HINSTANCE hThisInst, HINSTANCE hPrevInst,
LPSTR args, int winMode)
{
HWND hwnd;
MSG msg;
WNDCLASSEX wcl;
HACCEL hAccel;
// Define a window class.
wcl.cbSize = sizeof(WNDCLASSEX);
wcl.hInstance = hThisInst; // handle to this instance
wcl.lpszClassName = "MyWin"; // window class name
wcl.lpfnWndProc = WindowFunc; // window function
wcl.style = 0; // default style
wcl.hIcon = LoadIcon(NULL, IDI_APPLICATION); // large icon
wcl.hIconSm = NULL; // use small version of large icon
wcl.hCursor = LoadCursor(NULL, IDC_ARROW); // cursor style
wcl.lpszMenuName = "ThreadAppMenu"; // main menu
wcl.cbClsExtra = 0; // no extra memory needed
wcl.cbWndExtra = 0;
// Make the window background white.
wcl.hbrBackground = (HBRUSH) GetStockObject(WHITE_BRUSH);
// Register the window class.
if(!RegisterClassEx(&wcl)) return 0;
/* Now that a window class has been registered, a window
can be created. */
hwnd = CreateWindow(
wcl.lpszClassName, // name of window class
"Using a Thread Control Panel", // title
WS_OVERLAPPEDWINDOW, // window style - normal
CW_USEDEFAULT, // X coordinate - let Windows decide
CW_USEDEFAULT, // Y coordinate - let Windows decide
260, // width
200, // height
NULL, // no parent window
NULL, // no override of class menu
hThisInst, // instance handle
NULL // no additional arguments
);
hInst = hThisInst; // save instance handle
// Load the keyboard accelerators.
hAccel = LoadAccelerators(hThisInst, "ThreadAppMenu");
// Display the window.
ShowWindow(hwnd, winMode);
UpdateWindow(hwnd);
// Create the message loop.
while(GetMessage(&msg, NULL, 0, 0))
{
if(!TranslateAccelerator(hwnd, hAccel, &msg)) {
TranslateMessage(&msg); // translate keyboard messages
DispatchMessage(&msg); // return control to Windows
}
}
return msg.wParam;
}
/* This function is called by Windows and is passed
messages from the message queue.
*/
LRESULT CALLBACK WindowFunc(HWND hwnd, UINT message,
WPARAM wParam, LPARAM lParam)
{
int response;
switch(message) {
case WM_COMMAND:
switch(LOWORD(wParam)) {
case IDM_THREAD: // create the threads
hThread1 = (HANDLE) _beginthreadex(NULL, 0,
MyThread1, (void *) hwnd,
0, &tid1);
hThread2 = (HANDLE) _beginthreadex(NULL, 0,
MyThread2, (void *) hwnd,
0, &tid2);
break;
case IDM_PANEL: // activate control panel
ThrdCtrlPanel(hInst, hThread1);
ThrdCtrlPanel(hInst, hThread2);
break;
case IDM_EXIT:
response = MessageBox(hwnd, "Quit the Program?",
"Exit", MB_YESNO);
if(response == IDYES) PostQuitMessage(0);
break;
case IDM_HELP:
MessageBox(hwnd,
"F1: Help\nF2: Start Threads\nF3: Panel",
"Help", MB_OK);
break;
}
break;
case WM_DESTROY: // terminate the program
PostQuitMessage(0);
break;
default:
return DefWindowProc(hwnd, message, wParam, lParam);
}
return 0;
}
// First thread.
unsigned __stdcall MyThread1(void * param)
{
int i;
HDC hdc;
for(i=0; iwsprintf(str, "Thread 1: loop # %5d ", i);
hdc = GetDC((HWND) param);
TextOut(hdc, 1, 1, str, lstrlen(str));
ReleaseDC((HWND) param, hdc);
}
return 0;
}
// Second thread.
unsigned __stdcall MyThread2(void * param)
{
int i;
HDC hdc;
for(i=0; iwsprintf(str, "Thread 2: loop # %5d ", i);
hdc = GetDC((HWND) param);
TextOut(hdc, 1, 20, str, lstrlen(str));
ReleaseDC((HWND) param, hdc);
}
return 0;
}

This program requires the header file thrdapp.h, shown here:

#define IDM_THREAD 100
#define IDM_HELP 101
#define IDM_PANEL 102
#define IDM_EXIT 103

The resource file required by the program is shown here:

#include
#include "thrdapp.h"
#include "tcp.rc"
ThreadAppMenu MENU
{
POPUP "&Threads" {
MENUITEM "&Start Threads\tF2", IDM_THREAD
MENUITEM "&Control Panels\tF3", IDM_PANEL
MENUITEM "E&xit\tCtrl+X", IDM_EXIT
}
MENUITEM "&Help", IDM_HELP
}
ThreadAppMenu ACCELERATORS
{
VK_F1, IDM_HELP, VIRTKEY
VK_F2, IDM_THREAD, VIRTKEY
VK_F3, IDM_PANEL, VIRTKEY
"^X", IDM_EXIT
}


Although controlling threads using the thread control panel is useful when developing multithreaded programs, ultimately it is using threads that makes them important. Toward this end, this chapter shows a multithreaded version of the GCPtr garbage collector class originally developed in Chapter 2. Recall that the version of GCPtr shown in Chapter 2 collected unused memory each time a GCPtr object went out of scope. Although this approach is fine for some applications, often a better alternative is have the garbage collector run as a background task, recycling memory whenever free CPU cycles are available. The implementation developed here is designed for Windows, but the same basic techniques apply to other multithreaded environments.

To convert GCPtr into a background task is actually fairly easy, but it does involve a number of changes. Here are the main ones:

Member variables that support the thread must be added to GCPtr. These variables include the thread handle, the mutex handle, and an instance counter that keeps track of the number of GCPtr objects in existence.

The constructor for GCPtr must begin the garbage collection thread. The constructor must also create the mutex that controls synchronization. This must happen only once, when the first GCPtr object is created.

Another exception must be defined that will be used to indicate a time-out condition.

The GCPtr destructor must no longer call collect( ). Garbage collection is handled by the garbage collection thread.

A function called gc( ) that serves as the thread entry point for the garbage collector must be defined.

A function called isRunning( ) must be defined. It returns true if the garbage collection is in use.

The member functions of GCPtr that access the garbage collection list contained in gclist must be synchronized so that only one thread at a time can access the list.

The following sections show the changes.

The Additional Member Variables
The multithreaded version of GCPtr requires that the following member variables be added:

// These support multithreading.
unsigned tid; // thread id
static HANDLE hThrd; // thread handle
static HANDLE hMutex; // handle of mutex
static int instCount; // counter of GCPtr objects

The ID of the thread used by the garbage collector is stored in tid. This member is unused except in the call to _beginthreadex( ). The handle to the thread is stored in hThrd. The handle of the mutex used to synchronize access to GCPtr is stored in hMutex. A count of GCPtr objects in existence is maintained in instCount. The last three are static because they are shared by all instances of GCPtr. They are defined like this, outside of GCPtr:

template
int GCPtr::instCount = 0;
template
HANDLE GCPtr::hMutex = 0;
template
HANDLE GCPtr::hThrd = 0;

The Multithreaded GCPtr Constructor

In addition to its original duties, the multithreaded GCPtr( ) must create the mutex, start the garbage collector thread, and update the instance counter. Here is the updated version:

// Construct both initialized and uninitialized objects. GCPtr(T *t=NULL) {
// When first object is created, create the mutex
// and register shutdown().
if(hMutex==0) {
hMutex = CreateMutex(NULL, 0, NULL);
atexit(shutdown);
}
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(t);
// If t is already in gclist, then
// increment its reference count.
// Otherwise, add it to the list.
if(p != gclist.end())
p->refcount++; // increment ref count
else {
// Create and store this entry.
GCInfo gcObj(t, size);
gclist.push_front(gcObj);
}
addr = t;
arraySize = size;
if(size > 0) isArray = true;
else isArray = false;
// Increment instance counter for each new object.
instCount++;
// If the garbage collection thread is not
// currently running, start it running.
if(hThrd==0) {
hThrd = (HANDLE) _beginthreadex(NULL, 0, gc,
(void *) 0, 0, (unsigned *) &tid);
// For some applications, it will be better
// to lower the priority of the garbage collector
// as shown here:
//
// SetThreadPriority(hThrd,
// THREAD_PRIORITY_BELOW_NORMAL);
}
ReleaseMutex(hMutex);
}

Let’s examine this code closely. First, if hMutex is zero, it means that this is the first GCPtr object to be created and no mutex has yet been created for the garbage collector. If this is the case, the mutex is created and its handle is assigned to hMutex. At the same time, the function shutdown( ) is registered as a termination function by calling atexit( ).

It is important to note that in the multithreaded garbage collector, shutdown( ) serves two purposes. First, as in the original version of GCPtr, shutdown( ) frees any unused memory that has not been released because of a circular reference. Second, when a program using the multithreaded garbage collector ends, it stops the garbage collection thread. This means that there might still be dynamically allocated objects that haven’t been freed. This is important because these objects might have destructors that need to be called. Because shutdown( ) releases all remaining objects, it also releases these objects.

Next, the mutex is acquired by calling WaitForSingleObject( ). This is necessary to prevent two threads from accessing gclist at the same time. Once the mutex has been acquired, a search of gclist is made, looking for any preexisting entry that matches the address in t. If one is found, its reference count is incremented. If no preexising entry matches t, a new GCInfo object is created that contains this address, and this object is added to gclist. Then, addr, arraySize, and isArray are set. These actions are the same as in the original version of GCPtr.

Next, instCount is incremented. Recall that instCount is initialized to zero. Incrementing it each time an object is created keeps track of how many GCPtr objects are in existence. As long as this count is above zero, the garbage collector will continue to execute.

Next, if hThrd is zero (as it is initially), then no thread has yet been created for the garbage collector. In this case, _beginthreadex( ) is called to begin the thread. A handle to the thread is then assigned to hThrd. The thread entry function is called gc( ), and it is examined shortly.

Finally, the mutex is released and the constructor returns. It is important to point out that each call to WaitForSingleObject( ) must be balanced by a call to ReleaseMutex( ), as shown in the GCPtr constructor. Failure to release the mutex will cause deadlock.

The TimeOutExc Exception

As you probably noticed in the code for GCPtr( ) described in the preceding section, if the mutex cannot be acquired after 10 seconds, then a TimeOutExc is thrown. Frankly, 10 seconds is a very long time, so a time-out shouldn’t ever happen unless something disrupts the task scheduler of the operating system. However, in the event it does occur, your application code may want to catch this exception. The TimeOutExc class is shown here:

// Exception thrown when a time-out occurs
// when waiting for access to hMutex.
//
class TimeOutExc {
// Add functionality if needed by your application.
};

Notice that it contains no members. Its existence as a unique type is sufficient for the purposes of this chapter. Of course, you can add functionality if desired.

The Multithreaded GCPtr Destructor

Unlike the single-threaded version of the GCPtr destructor, the multithreaded version of ~GCPtr( ) does not call collect( ). Instead, it simply decrements the reference count of the memory pointed to by the GCPtr that is going out of scope. The actual collection of garbage (if any exists) is handled by the garbage collection thread. The destructor also decrements the instance counter, instCount.

The multithreaded version of ~GCPtr( ) is shown here:

// Destructor for GCPtr.
template
GCPtr::~GCPtr() {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(addr);
if(p->refcount) p->refcount--; // decrement ref count
// Decrement instance counter for each object
// that is destroyed.
instCount--;
ReleaseMutex(hMutex);
}

The gc( ) Function

The entry function for the garbage collector is called gc( ), and it is shown here:

// Entry point for garbage collector thread.
template
unsigned __stdcall GCPtr::gc(void * param) {
#ifdef DISPLAY
cout << "Garbage collection started.\n";
#endif
while(isRunning()) {
collect();
}
collect(); // collect garbage on way out
// Release and reset the thread handle so
// that the garbage collection thread can
// be restarted if necessary.
CloseHandle(hThrd);
hThrd = 0;
#ifdef DISPLAY
cout << "Garbage collection terminated for "
<< typeid(T).name() << "\n";
#endif
return 0;
}

The gc( ) function is quite simple: it runs as long as the garbage collector is in use. The isRunning( ) function returns true if instCount is greater than zero (which means that the garbage collector is still needed) and false otherwise. Inside the loop, collect( ) is called continuously. This approach is suitable for demonstrating the multithreaded garbage collector, but it is probably too inefficient for real-world use. You might want to experiment with calling collect( ) less often, such as only when memory runs low. You could also experiment by calling the Windows API function Sleep( ) after each call to collect( ). Sleep( ) pauses the execution of the calling thread for a specified number of milliseconds. While sleeping, a thread does not consume CPU time.

When isRunning( ) returns false, the loop ends, causing gc( ) to eventually end, which stops the garbage collection thread. Because of the multithreading, it is possible that there will still be an entry on gclist that has not yet been freed even though isRunning( ) returns false. To handle this case, a final call to collect( ) is made before gc( ) ends.

Finally, the thread handle is released via a call to the Windows API function CloseHandle( ), and its value is set to zero. Setting hThrd to zero enables the GCPtr constructor to restart the thread if later in the program new GCPtr objects are created.

The isRunning( ) Function

The isRunning( ) function is shown here:

// Returns true if the collector is still in use.
static bool isRunning() { return instCount > 0; }

It simply compares instCount to zero. As long as instCount is greater than 0, at least one GCPtr pointer is still in existence and the garbage collector is still needed.

Many of the functions in GCPtr access gclist, which holds the garbage collection list. Access to gclist must be synchronized to prevent two or more threads from attempting to use it at the same time. The reason for this is easy to understand. If access were not synchronized, then, for example, one thread might be obtaining an iterator to the end of the list at the same time that another thread is adding or deleting an element from the list. In this case, the iterator would be invalid. To prevent such problems, each sequence of code that accesses gclist must be guarded by a mutex. The copy constructor for GCPtr shown here is one example:

// Copy constructor.
GCPtr(const GCPtr &ob) {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(ob.addr);
p->refcount++; // increment ref count
addr = ob.addr;
arraySize = ob.arraySize;
if(arraySize > 0) isArray = true;
else isArray = false;
instCount++; // increase instance count for copy
ReleaseMutex(hMutex);
}

Notice that the first thing that the copy constructor does is acquire the mutex. Once acquired, it creates a copy of the object and adjusts the reference count for the memory being pointed to. On its way out, the copy constructor releases the mutex. This same basic method is applied to all functions that access gclist.

Two Other Changes

There are two other changes that you must make to the original version of the garbage collector. First, recall that the original version of GCPtr defined a static variable called first that indicated when the first GCPtr was created. This variable is no longer needed because hMutex now performs this function. Thus, remove first from GCPtr. Because it is a static variable, you will also need to remove its definition outside of GCPtr.

In the original, single-threaded version of the garbage collector, if you defined the DISPLAY macro, you could watch the garbage collector in action. Most of that code has been removed in the multithreaded version because multithreading causes the output to be scrambled and unintelligible in most cases. For the multithreaded version, defining DISPLAY simply lets you know when the garbage collector has started and when it has stopped.

The entire multithreaded version of the garbage collector is shown here. Call this file gcthrd.h.

// A garbage collector that runs as a back ground task.
#include
#include
#include
#include
#include
#include
using namespace std;
// To watch the action of the garbage collector, define DISPLAY.
// #define DISPLAY
// Exception thrown when an attempt is made to
// use an Iter that exceeds the range of the
// underlying object.
//
class OutOfRangeExc {
// Add functionality if needed by your application.
};
// Exception thrown when a time-out occurs
// when waiting for access to hMutex.
//
class TimeOutExc {
// Add functionality if needed by your application.
};
// An iterator-like class for cycling through arrays
// that are pointed to by GCPtrs. Iter pointers
// ** do not ** participate in or affect garbage
// collection. Thus, an Iter pointing to
// some object does not prevent that object
// from being recycled.
//
template class Iter {
T *ptr; // current pointer value
T *end; // points to element one past end
T *begin; // points to start of allocated array
unsigned length; // length of sequence
public:
Iter() {
ptr = end = begin = NULL;
length = 0;
}
Iter(T *p, T *first, T *last) {
ptr = p;
end = last;
begin = first;
length = last - first;
}
// Return length of sequence to which this
// Iter points.
unsigned size() { return length; }
// Return value pointed to by ptr.
// Do not allow out-of-bounds access.
T &operator*() {
if( (ptr >= end) || (ptr < begin) )
throw OutOfRangeExc();
return *ptr;
}
// Return address contained in ptr.
// Do not allow out-of-bounds access.
T *operator->() {
if( (ptr >= end) || (ptr < begin) )
throw OutOfRangeExc();
return ptr;
}
// Prefix ++.
Iter operator++() {
ptr++;
return *this;
}
// Prefix --.
Iter operator--() {
ptr--;
return *this;
}
// Postfix ++.
Iter operator++(int notused) {
T *tmp = ptr;
ptr++;
return Iter(tmp, begin, end);
}
// Postfix --.
Iter operator--(int notused) {
T *tmp = ptr;
ptr--;
return Iter(tmp, begin, end);
}
// Return a reference to the object at the
// specified index. Do not allow out-of-bounds
// access.
T &operator[](int i) {
if( (i <>= (end-begin)) )
throw OutOfRangeExc();
return ptr[i];
}
// Define the relational operators.
bool operator==(Iter op2) {
return ptr == op2.ptr;
}
bool operator!=(Iter op2) {
return ptr != op2.ptr;
}
bool operator<(Iter op2) {
return ptr < op2.ptr;
}
bool operator<=(Iter op2) {
return ptr <= op2.ptr;
}
bool operator>(Iter op2) {
return ptr > op2.ptr;
}
bool operator>=(Iter op2) {
return ptr >= op2.ptr;
}
// Subtract an integer from an Iter.
Iter operator-(int n) {
ptr -= n;
return *this;
}
// Add an integer to an Iter.
Iter operator+(int n) {
ptr += n;
return *this;
}
// Return number of elements between two Iters.
int operator-(Iter &itr2) {
return ptr - itr2.ptr;
}
};
// This class defines an element that is stored
// in the garbage collection information list.
//
template class GCInfo {
public:
unsigned refcount; // current reference count
T *memPtr; // pointer to allocated memory
/* isArray is true if memPtr points
to an allocated array. It is false
otherwise. */
bool isArray; // true if pointing to array
/* If memPtr is pointing to an allocated
array, then arraySize contains its size */
unsigned arraySize; // size of array
// Here, mPtr points to the allocated memory.
// If this is an array, then size specifies
// the size of the array.
GCInfo(T *mPtr, unsigned size=0) {
refcount = 1;
memPtr = mPtr;
if(size != 0)
isArray = true;
else
isArray = false;
arraySize = size;
}
};
// Overloading operator== allows GCInfos to be compared.
// This is needed by the STL list class.
template bool operator==(const GCInfo &ob1,
const GCInfo &ob2) {
return (ob1.memPtr == ob2.memPtr);
}
// GCPtr implements a pointer type that uses
// garbage collection to release unused memory.
// A GCPtr must only be used to point to memory
// that was dynamically allocated using new.
// When used to refer to an allocated array,
// specify the array size.
//
template class GCPtr {
// gclist maintains the garbage collection list.
static list > gclist;
// addr points to the allocated memory to which
// this GCPtr pointer currently points.
T *addr;
/* isArray is true if this GCPtr points
to an allocated array. It is false
otherwise. */
bool isArray; // true if pointing to array
// If this GCPtr is pointing to an allocated
// array, then arraySize contains its size.
unsigned arraySize; // size of the array
// These support multithreading.
unsigned tid; // thread id
static HANDLE hThrd; // thread handle
static HANDLE hMutex; // handle of mutex
static int instCount; // counter of GCPtr objects
// Return an iterator to pointer info in gclist.
typename list >::iterator findPtrInfo(T *ptr);
public:
// Define an iterator type for GCPtr.
typedef Iter GCiterator;
// Construct both initialized and uninitialized objects.
GCPtr(T *t=NULL) {
// When first object is created, create the mutex
// and register shutdown().
if(hMutex==0) {
hMutex = CreateMutex(NULL, 0, NULL);
atexit(shutdown);
}
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(t);
// If t is already in gclist, then
// increment its reference count.
// Otherwise, add it to the list.
if(p != gclist.end())
p->refcount++; // increment ref count
else {
// Create and store this entry.
GCInfo gcObj(t, size);
gclist.push_front(gcObj);
}
addr = t;
arraySize = size;
if(size > 0) isArray = true;
else isArray = false;
// Increment instance counter for each new object.
instCount++;
// If the garbage collection thread is not
// currently running, start it running.
if(hThrd==0) {
hThrd = (HANDLE) _beginthreadex(NULL, 0, gc,
(void *) 0, 0, (unsigned *) &tid);
// For some applications, it will be better
// to lower the priority of the garbage collector
// as shown here:
//
// SetThreadPriority(hThrd,
// THREAD_PRIORITY_BELOW_NORMAL);
}
ReleaseMutex(hMutex);
}
// Copy constructor.
GCPtr(const GCPtr &ob) {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(ob.addr);
p->refcount++; // increment ref count
addr = ob.addr;
arraySize = ob.arraySize;
if(arraySize > 0) isArray = true;
else isArray = false;
instCount++; // increase instance count for copy
ReleaseMutex(hMutex);
}
// Destructor for GCPTr.
~GCPtr();
// Collect garbage. Returns true if at least
// one object was freed.
static bool collect();
// Overload assignment of pointer to GCPtr.
T *operator=(T *t);
// Overload assignment of GCPtr to GCPtr.
GCPtr &operator=(GCPtr &rv);
// Return a reference to the object pointed
// to by this GCPtr.
T &operator*() {
return *addr;
}
// Return the address being pointed to.
T *operator->() { return addr; }
// Return a reference to the object at the
// index specified by i.
T &operator[](int i) {
return addr[i];
}
// Conversion function to T *.
operator T *() { return addr; }
// Return an Iter to the start of the allocated memory. Iter begin() {
int size;
if(isArray) size = arraySize;
else size = 1;
return Iter(addr, addr, addr + size);
}
// Return an Iter to one past the end of an allocated array.
Iter end() {
int size;
if(isArray) size = arraySize;
else size = 1;
return Iter(addr + size, addr, addr + size);
}
// Return the size of gclist for this type
// of GCPtr.
static int gclistSize() {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
unsigned sz = gclist.size();
ReleaseMutex(hMutex);
return sz;
}
// A utility function that displays gclist.
static void showlist();
// The following functions support multithreading.
//
// Returns true if the collector is still in use.
static bool isRunning() { return instCount > 0; }
// Clear gclist when program exits.
static void shutdown();
// Entry point for garbage collector thread.
static unsigned __stdcall gc(void * param);
};
// Create storage for the static variables.
template
list > GCPtr::gclist;
template
int GCPtr::instCount = 0;
template
HANDLE GCPtr::hMutex = 0;
template
HANDLE GCPtr::hThrd = 0;
// Destructor for GCPtr.
template
GCPtr::~GCPtr() {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
p = findPtrInfo(addr);
if(p->refcount) p->refcount--; // decrement ref count
// Decrement instance counter for each object
// that is destroyed.
instCount--;
ReleaseMutex(hMutex);
}
// Collect garbage. Returns true if at least
// one object was freed.
template
bool GCPtr::collect() {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
bool memfreed = false;
list >::iterator p;
do {
// Scan gclist looking for unreferenced pointers.
for(p = gclist.begin(); p != gclist.end(); p++) {
// If in-use, skip.
if(p->refcount > 0) continue;
memfreed = true;
// Remove unused entry from gclist.
gclist.remove(*p);
// Free memory unless the GCPtr is null.
if(p->memPtr) {
if(p->isArray) {
delete[] p->memPtr; // delete array
}
else {
delete p->memPtr; // delete single element
}
}
// Restart the search.
break;
}
} while(p != gclist.end());
ReleaseMutex(hMutex);
return memfreed;
}
// Overload assignment of pointer to GCPtr.
template
T * GCPtr::operator=(T *t) {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
// First, decrement the reference count
// for the memory currently being pointed to.
p = findPtrInfo(addr);
p->refcount--;
// Next, if the new address is already
// existent in the system, increment its
// count. Otherwise, create a new entry
// for gclist.
p = findPtrInfo(t);
if(p != gclist.end())
p->refcount++;
else {
// Create and store this entry.
GCInfo gcObj(t, size);
gclist.push_front(gcObj);
}
addr = t; // store the address.
ReleaseMutex(hMutex);
return t;
}
// Overload assignment of GCPtr to GCPtr.
template
GCPtr & GCPtr::operator=(GCPtr &rv) {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
// First, decrement the reference count
// for the memory currently being pointed to.
p = findPtrInfo(addr);
p->refcount--;
// Next, increment the reference count of
// of the new object.
p = findPtrInfo(rv.addr);
p->refcount++; // increment ref count
addr = rv.addr;// store the address.
ReleaseMutex(hMutex);
return rv;
}
// A utility function that displays gclist.
template
void GCPtr::showlist() {
if(WaitForSingleObject(hMutex, 10000)==WAIT_TIMEOUT)
throw TimeOutExc();
list >::iterator p;
cout << "gclist<" << typeid(T).name() << ", "
<<>:\n";
cout << "memPtr refcount value\n";
if(gclist.begin() == gclist.end()) {
cout << " -- Empty --\n\n";
return;
}
for(p = gclist.begin(); p != gclist.end(); p++) {
cout << "[" << (void *)p->memPtr << "]"
<< " " <<>refcount << " ";
if(p->memPtr) cout << " " << *p->memPtr;
else cout << " ---";
cout << endl;
}
cout << endl;
ReleaseMutex(hMutex);
}
// Find a pointer in gclist.
template
typename list >::iterator
GCPtr::findPtrInfo(T *ptr) {
list >::iterator p;
// Find ptr in gclist.
for(p = gclist.begin(); p != gclist.end(); p++)
if(p->memPtr == ptr)
return p;
return p;
}
// Entry point for garbage collector thread.
template
unsigned __stdcall GCPtr::gc(void * param) {
#ifdef DISPLAY
cout << "Garbage collection started.\n";
#endif
while(isRunning()) {
collect();
}
collect(); // collect garbage on way out
// Release and reset the thread handle so
// that the garbage collection thread can
// be restarted if necessary.
CloseHandle(hThrd);
hThrd = 0;
#ifdef DISPLAY
cout << "Garbage collection terminated for "
<< typeid(T).name() << "\n";
#endif
return 0;
}
// Clear gclist when program exits.
template
void GCPtr::shutdown() {
if(gclistSize() == 0) return; // list is empty
list >::iterator p;
#ifdef DISPLAY
cout << "Before collecting for shutdown() for "
<< typeid(T).name() << "\n";
#endif
for(p = gclist.begin(); p != gclist.end(); p++) {
// Set all remaining reference counts to zero.
p->refcount = 0;
}
collect();
#ifdef DISPLAY
cout << "After collecting for shutdown() for "
<< typeid(T).name() << "\n";
#endif
}

To use the multithreaded garbage collector, include gcthrd.h in your program. Then, use GCPtr in the same way as described in Chapter 2. When you compile the program, you must remember to link in the multithreaded libraries, as explained earlier in this chapter in the section describing _beginthreadex( ) and endthreadex( ).

To see the effects of the multithreaded garbage collector, try this version of the load test program originally shown in Chapter 2:

// Demonstrate the multithreaded garbage collector. #include
#include
#include "gcthrd.h"
using namespace std;
// A simple class for load testing GCPtr.
class LoadTest {
int a, b;
public:
double n[100000]; // just to take-up memory
double val;
LoadTest() { a = b = 0; }
LoadTest(int x, int y) {
a = x;
b = y;
val = 0.0;
}

friend ostream &operator<(ostream &strm, LoadTest &obj);
};
// Create an insertor for LoadTest.
ostream &operator<(ostream &strm, LoadTest &obj) {
strm << "(" << obj.a << " " << obj.b << ")";
return strm;
}
int main() {
GCPtr mp;
int i;
for(i = 1; i < 2000; i++) {
try {
mp = new LoadTest(i, i);
if(!(i%100))
cout << "gclist contains " << mp.gclistSize()
<< " entries.\n";
} catch(bad_alloc xa) {
// For most users, this exception won't
// ever occur.
cout << "Last object: " << *mp << endl;
cout << "Length of gclist: "
<< mp.gclistSize() << endl;
}
}
return 0;
}

Here is a sample run. (Of course, your output may vary.) This output was produced with the display option turned on by defining DISPLAY within gcthrd.h.

Garbage collection started.
gclist contains 42 entries.
gclist contains 35 entries.
gclist contains 29 entries.
gclist contains 22 entries.
gclist contains 18 entries.
gclist contains 11 entries.
gclist contains 4 entries.
gclist contains 51 entries.
gclist contains 47 entries.
gclist contains 40 entries.
gclist contains 33 entries.
gclist contains 26 entries.
gclist contains 19 entries.
gclist contains 15 entries.
gclist contains 10 entries.
gclist contains 3 entries.
gclist contains 53 entries.
gclist contains 46 entries.
gclist contains 42 entries.
Before collecting for shutdown() for class LoadTest
After collecting for shutdown() for class LoadTest

As you can see, because collect( ) is running in the background, gclist never gets very large, even though thousands of objects are being allocated and abandoned.

Some Things to Try

Creating successful multithreaded programs can be quite challenging. One reason for this is the fact that multithreading requires that you think of programs in parallel rather than linear terms. Furthermore, at runtime, threads interact in ways that are often difficult to anticipate. Thus, you might be surprised (or even bewildered) by the actions of a multithreaded program. The best way to get good at multithreading is to play with it. Toward this end, here are some ideas that you might want to try.

Try adding another list box to the thread control panel that lets the user adjust the priority class of the thread in addition to its priority value. Try adding various synchronization objects to the control panel that can be turned on or off under user control. This will let you experiment with different synchronization options.

For the multithreaded garbage collector, try collecting garbage less often, such as when gclist reaches a certain size or after free memory drops to a predetermined point. Alternatively, you could use a waitable timer to activate garbage collection on a regular basis. Finally, you might want to experiment with the garbage collector’s priority class and settings to find which level is optimal for your use.

What is DirectX?

Before the release of Windows 95, most games were released for the DOS platform, usually using something like DOS4GW or some other 32-bit DOS extender to obtain access to 32-bit protected mode. Windows 95, however, seemed to signal the beginning of the end of the DOS prompt. Games developers began to wonder how they were going to write games optimally that would run under Windows 95 - games typically need to run in full-screen mode, and need to get as close as possible to your hardware. Windows 95 seemed to be "getting in the way" of this. DOS had allowed them to program as "close to the metal" as possible, that is, get straight to the hardware, without going through layers of abstraction and encapsulation. In those days, the extra overhead of a generic API would have made games too slow.

DirectX is comprised of application programming interfaces (APIs) that are grouped into two classes: the DirectX Foundation layer, and the DirectX Media layer. These APIs enable programs to directly access many of your computer''''s hardware devices.

The DirectX Foundation layer automatically determines the hardware capabilities of your computer and then sets your programs'''' parameters to match. This allows multimedia applications to run on any Windows-based computer and at the same time ensures that the multimedia applications take full advantage of high-performance hardware.

The DirectX Foundation layer contains a single set of APIs that provide improved access to the advanced features of high-performance hardware, such as 3-D graphics acceleration chips and sound cards. These APIs control low-level functions, including 2-D graphics acceleration; support for input devices such as joysticks, keyboards, and mice; and control of sound mixing and sound output. The low-level functions are supported by the components that make up the DirectX Foundation layer:

Microsoft DirectDraw

The Microsoft DirectDraw API supports extremely fast, direct access to the accelerated hardware capabilities of a computer''''s video adapter. It supports standard methods of displaying graphics on all video adapters, and faster, more direct access when using accelerated drivers. DirectDraw provides a device-independent way for programs, such as games and two-dimensional (2-D) graphics packages, and Windows system components, such as digital video codecs, to gain access to the features of specific display devices without requiring any additional information from the user about the device''''s capabilities.

Microsoft Direct3D Immediate Mode

The Microsoft Direct3D Immediate Mode API (Direct3D) provides an interface to the 3-D rendering functions built into most new video cards. Direct3D is a low-level 3-D API that provides a device-independent way for applications to communicate with accelerator hardware efficiently and powerfully.

Direct3D provides application developers with many advanced features, such as:

  • Switchable depth buffering (using z-buffers or w-buffers)
  • Flat and Gouraud shading
  • Multiple lights and light types
  • Full material and texture support
  • Robust software emulation drivers
  • Transformation and clipping
  • Hardware independence
  • Full hardware acceleration on Windows 2000 (when the appropriate device drivers are available)
  • Built-in support for the specialized CPU instruction sets, including Intel''''s MMX and Pentium III architectures, and the 3DNow! architecture

Microsoft DirectSound

The Microsoft DirectSound API provides a link between programs and an audio adapter''''s sound mixing and playback capabilities. It also enables wave sound capture and playback. DirectSound provides multimedia applications with low-latency mixing, hardware acceleration, and direct access to the sound device. It provides this feature while maintaining compatibility with existing device drivers.

Microsoft DirectMusic

The Microsoft DirectMusic API is the musical component of DirectX. Unlike the DirectSound API, which captures and plays digital sound samples, DirectMusic works with message-based musical data that is converted to digital audio either by your sound card or by its built-in software synthesizer. As well as supporting input in Musical Instrument Digital Interface (MIDI) format, DirectMusic provides application developers the ability to create immersive, dynamic soundtracks that respond to user input.

Microsoft DirectInput

The Microsoft DirectInput API provides advanced input for games and processes input from joysticks as well as other related devices including the mouse, keyboard, and other game controllers, such as force-feedback game controllers.

The DirectX Media layer works with the DirectX Foundation layer to provide high-level services that support animation, media streaming (transmission and viewing of audio and video as it is downloaded over the Internet), and interactivity. Like the DirectX Foundation layer, the DirectX Media layer is comprised of several integrated components that include:

Microsoft Direct3D Retained Mode

The Microsoft Direct3D Retained Mode API provides higher-level support for advanced, real-time, three-dimensional (3-D) graphics. Direct3D Retained Mode provides built-in support for graphics techniques like hierarchies and animation. Direct3D Retained Mode is built on top of Direct3D Immediate Mode.

Microsoft DirectAnimation

The Microsoft DirectAnimation API provides integration and animation for different types of media, such as two-dimensional images, three-dimensional objects, sounds, movies, text, and vector graphics.

Microsoft DirectPlay

The Microsoft DirectPlay API supports game connections over a modem, the Internet, or LAN. DirectPlay simplifies access to communication services and provides a way for games to communicate with each other, independent of the underlying protocol, or online service.

Microsoft DirectShow

The Microsoft DirectShow API plays multimedia files located in local files or on Internet servers, and captures multimedia streams from devices, such as video capture cards. DirectShow plays video and audio content compressed in various formats, including MPEG, audio-video interleaved (AVI), and WAV.

Microsoft DirectX Transform

The Microsoft DirectX Transform API enables application developers to create, animate, and edit digital images. DirectX Transform works with both two-dimensional (2-D) images and three-dimensional (3-D) images, which can be used to create stand-alone programs or dynamic plug-ins for Web graphics.

CH 1: What is Direct X and its Components

What is DirectX?

Before the release of Windows 95, most games were released for the DOS platform, usually using something like DOS4GW or some other 32-bit DOS extender to obtain access to 32-bit protected mode. Windows 95, however, seemed to signal the beginning of the end of the DOS prompt. Games developers began to wonder how they were going to write games optimally that would run under Windows 95 - games typically need to run in full-screen mode, and need to get as close as possible to your hardware. Windows 95 seemed to be "getting in the way" of this. DOS had allowed them to program as "close to the metal" as possible, that is, get straight to the hardware, without going through layers of abstraction and encapsulation. In those days, the extra overhead of a generic API would have made games too slow.

So Microsoft's answer to this problem was a Software Development Kit (SDK) called DirectX. DirectX is a horrible, clunky, poorly-designed, poorly-documented, bloated, ugly, confusing beast (*) of an API (Application Programming Interface) that has driven many a programmer to drink. It was originally purchased from a London company called RenderMorphics, and quietly released more or less as is as DirectX 2. DirectX 3 was probably the first "serious" release by Microsoft, who had now begun to actively push it as the games programming API of the future. Being the biggest software company on the planet, and being the developers of the Operating System that some 90% of desktop users were using, they succeeded. Hardware vendors quickly realised that following the Microsoft lead was the prudent thing to do, and everyone began to produce DirectX drivers for their hardware. In many ways this was a good thing for game developers.

A lot of improvements have been made to the original DirectX. For example, the documentation doesn't suck as much as it originally did. Some of the poorly designed sections of the original API have been cleanup up and improved. Some of the really poorly designed sections of the original API have been removed.

One of the main purposes of DirectX is to provide a standard way of accessing many different proprietary hardware devices. For example, Direct3D provides a "standard" programming interface that can be used to access the 3D hardware acceleration features of almost all 3D cards on the market which have Direct3D drivers written for them. In theory this is supposed to make it possible for one application to transparently run as it is supposed to across a wide variety of different hardware configurations. In practice, it usually isn't this simple.

One of the reasons it isn't that simple, is that hardware (such as 3D graphics accelerators) normally only support a subset of the features available in DirectX, and you don't really want to use a feature if it isn't available in some sort of hardware accelerated form. To find out which features are available you have to query a device for a list of capabilities, and there can be many of these.

The DirectX API is designed primarily for writing games, but can be used in other types of applications as well. The API at the moment has five main sections:

DirectX Components

DirectDraw

2 dimensional graphics capabilities, surfaces, double buffering, etc

Direct3D

A relatively extensively functional 3D graphics programming API.

DirectSound

Sound; 3D sound

DirectPlay

Simplifies network game development

DirectInput

Handles input from various peripherals

Additionally, DirectX 6 introduces something called DirectMusic, which is supposed to make it easier for game developers to include music in their games so that the mood of the music changes depending on what type of action is going on in the game.

DirectX performance and hardware acceleration

Although the performance of Direct3D in software only is not too shabby, it doesn't quite cut it for serious games. DirectX is designed with hardware acceleration in mind. It tries to provide the lowest possible level access to hardware, while still remaining a generic interface. Allowing functions such as 3D triangle drawing to be performed on the graphics card frees the CPU (Central Processing Unit) to do other things. Typical Direct3D hardware accelerators would also have at least 4 or preferably 16 or more Megabytes of onboard RAM to store texture maps (bitmapped images made up of small dots called "pixels"), textures, sprites, overlays and more.

DirectDraw and Direct3D are built as a relatively thin layer above the hardware, using what is called the DirectDraw "hardware abstraction layer" (HAL). For functionality not provided by a certain card, an equivalent software implementation would be provided through the "hardware emulation layer" (HEL).


Diagram illustrating where the DirectDraw/Direct3D architecture fits in, hopefully reasonably accurately.

DirectX and COM

The set of DirectX modules are built as COM (Component Object Model) objects. COM is yet another ugly broken interface from Microsoft - although newer versions of COM don't suck as much as the earlier incarnations. Don't get me wrong, I'm not against the existence of something that does what COM does - but the implementation leaves much to be desired. Anyway, a COM object is a bit like a C++ class, in that it encapsulates a set of methods and attributes in a single module, and in that it provides a kludgy sort of inheritance model, whereby one COM object can be built to support all the methods of it's parent object, and then add some more.

You don't need to know much about COM to use DirectX, so don't worry too much about it. You do a little bit of COM stuff when initializing objects and cleaning them up, and when checking return values of function calls, but that's more or less it.

CH 2 - Palettes, Gaming concepts, double buffering

Video Modes

Screen modes come in several flavours, based on how many bits are used to store the color of each pixel on the screen. Naturally, the more bits you use per pixel, the more colours you can display at once; but there is more data to move into graphics memory to update the screen.

  • 1,2,4 and 8 bit "indexed" modes (8 bit is the most popular and is better known as "256-color mode").
  • 16-bit (64K colors) "high-color" modes
  • 24-bit (16.7M colors) "true-color" modes
  • 32-bit RGBA modes. The first 3 bytes are used the same as in 24-bit modes; the A byte is for an "alpha-channel", which provides information about the opacity (transparency) of the pixel.

These modes are available, typically, in the following resolutions:

  • 320x200
  • 320x240
  • 640x400
  • 640x480
  • 800x600
  • 1024x768
  • 1280x1024
  • 1600x1200 (drool)

with 640x480 being probably the most common mode for running games in at the moment.

Monitor's generally have a width that is 4/3 times their height (called the aspect ratio); so with modes where the number of pixels along the width is 4/3 times the number of pixels along the height, the pixels will have an aspect ratio of 1, and thus be physically square. That is to say, 100 pixels in one direction should then be the same physical length as 100 pixels in a perpendicular direction. Note that 320/200 does not have this property; so in 320x200 pixels are actually stretched to be taller than they are wide.

Color theory

There are a number of different ways that colors can be represented, known as "color models". The most common one is probably RGB (Red,Green,Blue). Nearly all possible visible colors can be produced by combining, in various proportions, the three primary colors red, green and blue. These are commonly stored as three bytes - each byte represents the relative intensity of each primary color as a value from 0 to 255 inclusive. Pure bright red, for example, would be RGB(255,0,0). Purple would be RGB(255,0,255), grey would be RGB(150,150,150), and so on.

Here is an example of some C code that you might use for representing RGB colors.

struct SColor
{
    int r;
    int g;
    int b;
};
 
SColor make_rgb( int r, int g, int b )
{
    SColor ret;
    ret.r = r;
    ret.g = g;
    ret.b = b;
    return ret;
}

Alternatively you may want to store an RGB color in an unsigned 32-bit integer. Bits 0 to 7 are used to store the blue value, bits 8 to 15 for the green and so on.

typedef unsigned int rgb_color;
 
#define MAKE_RGB(r,g,b) ( ((r) <<>

Anyway, I'm rambling now.

There are other color models, such as HSV (Hue, Saturation, Luminance), but I won't be going into them here. The book "Computer Graphics, principles and practise" by Foley & van Dam (often referred to as The Computer Graphics Bible) explains color modes in some detail, and how to convert between color modes.

High-color and true-color modes

In high-color and true-color modes, the pixels on the screen are stored in video memory as their corresponding RGB make-up values. For example, if the top left pixel on the screen was green, then (in true-color mode) the first three bytes in video memory would be 0, 255 and 0.

In high-color modes the RGB values are specified using (if I remember correctly) 5, 6 and 5 bits for red, green and blue respectively, so in the above example the first two bytes in video memory would be, in binary: 00000111 11100000.

Palette-based, or "indexed" modes

Indexed color modes use the notion of a color "look up table" (LUT). The most common of these modes is 8-bit, better known as 256 color mode. Each pixel on the screen is represented by a single byte, which means that up to 28 can be displayed on the screen at once. The colors assigned to each of these 256 indexes are stored as 3 byte RGB values in the LUT, and these colors are used by the graphics hardware to determine what color to display on the screen.

Creating an application using indexed modes can be a pain, especially for the graphics artist, but there are sometimes advantages to using indexed modes:

  • Less memory is required to store the information in bitmaps and on the screen.
  • Because less memory is required, drawing routines can be made faster, since there are fewer bytes to transfer.
  • Some interesting "palette animation" tricks, that would be quite difficult to do in a normal mode, can be done quite easily in indexed modes. By changing the values in the LUT, you can change the colors on the screen without modifying screen memory at all. For example, a fade-out can be done by fading the RGB values in the LUT to zero.
  • Some 3D accelerators support indexed modes for textures, which can be useful if (for example) you have a very large texture that takes up a lot of memory.

ModeX

ModeX is a special type of VGA 256 color mode in which the contents of graphics memory (i.e. what appears on the screen) is stored in a somewhat complex planar format. The resolution of ModeX modes isn't very high. DirectDraw knows how to write to ModeX surfaces, but the Windows GDI doesn't, so be careful when trying to mix GDI and DirectDraw ModeX surfaces. When setting the DirectDraw fullscreen mode, it is possible to choose whether or not DirectDraw is allowed to create ModeX surfaces. These days you probably want to avoid ModeX.

Pitch/Stride

Even though the screen resolution might be, say, 640x480x32, this does not necessarily mean that each row of pixels will take up 640*4 bytes in memory. For speed reasons, graphics cards often store surfaces wider than their logical width (a trade-off of memory for speed.) For example, a graphics card that supports a maximum of 1024x768 might store all modes from 320x200 up to 1024x768 as 1024x768 internally. This leaves a "margin" on the right side of a surface. This actual allocated width for a surface is known as the pitch or stride of the surface. It is important to know the pitch of any surface whose memory you are going to write into, whether it is a 2D DirectDraw surface or a texture map. The pitch of a surface can be queried using DirectDraw.

Text diagram illustrating pitch:

 
 Display memory:
+--------------------------+-------------+
|                          |             |
| -- screen width -------- |             |
|                          |             |
| -- pitch/stride ---------------------- |
|                          |             |
|                          |             |
|                          |             |
|                          |             |
|                          |             |
+--------------------------+-------------+

A few gaming concepts you'll need to know to write games

Bitmaps and sprites

A bitmap is an image on the computer that is stored as an array of pixel values. That's a pretty crappy description. Basically, a bitmap is any picture on the computer, normally a rectangular block of 'pixels'. A sprite is the same thing as a bitmap, except normally it refers to a bitmap that has transparent areas (exact definitions of sprite may vary from programmer to programmer.) Sprites are an extremely important component of games. They have a million and one uses. For example, your mouse cursor qualifies as a sprite. The monsters in DOOM are also sprites. They are flat images with transparent areas that are programmed to always face you. Note that the sprite always faces you - this doesn't mean the monster is facing you. Anyway, enough said about bitmaps and sprites, I think.

Double buffering and page flipping

If your game did all its drawing straight to the current display, the user would notice horribly flickery artefacts as the elements of the game got drawn onto the screen. The solution to this is to have two graphics buffers, a "front buffer" and a "back buffer". The front buffer is visible to the user, the back buffer is not. You do all your drawing to the back buffer, and then when you have finished drawing everything on the screen, you copy (or flip) the contents of the back buffer into the front buffer. This is known as double buffering, and some sort of double buffering scheme is used in virtually every game.

There are generally two ways to perform the transfer of the back buffer to the front buffer: copying or page-flipping.

  • Copying: The contents of the back buffer are simply copied over into the front buffer. The back buffer can be in system memory or be another video memory surface.
  • Page-flipping: With this technique, no actual copying is done. Both buffers must exist in video memory. For each frame of your game you alternate which of these two surfaces you draw to. You always draw to the currently invisible one, and at the end of rendering the frame, you instruct the graphics hardware to use that frame as the visible one. Thus the front buffer becomes the back buffer (and vice versa) each frame.

A problem that can arise from this technique is "tearing". Your monitor redraws the image on the screen fairly frequently, normally at around 70 times per second (or 70 Hertz). It normally draws from top to bottom. Now, it can happen that the screen has only drawn half of its image, when you decide to instruct it to start drawing something else, using any one of the two techniques described above. When you do this, the bottom half of the screen is drawn using the new image, while the top half still had the old image. The visual effect this produces is called tearing, or shearing. A solution exists, however. It is possible to time your page flipping to co-incide with the end of a screen refresh. I'll stop here though, having let you know that it is possible. (fixme: i think DirectDraw handles this for you, check this)

Clipping and DirectDraw clippers

Clipping is the name given to the technique of preventing drawing routines from drawing off the edge of the screen or other rectangular bounding area such as a window. If not performed, the general result could best be described as a mess. In DirectDraw, for example, when using windowed mode; Windows basically gives DirectDraw the right to draw anywhere on the screen that it wants to. However, a well-behaved DirectDraw application would normally only draw into it's own window. DirectX has an object called a "clipper" that can be attached to a DirectDraw surface to prevent it drawing outside of the window.

DirectDraw surfaces

DirectDraw uses "surfaces" to access any section of memory, either video memory or system memory, that is used to store (normally) bitmaps, texture maps, sprites, and the current contents of the screen or a window.

DirectDraw also provides support for "overlays"; a special type of sprite. An overlay is normally a surface containing a bitmap with transparent sections that will be "overlaid" on the entire screen. For example, a racing car game might use an overlay for the image of the cockpit controls and window frame.

The memory a DirectDraw surface uses can be lost in some circumstances, because DirectDraw has to share resources with the GDI. It is necessary for your application to check regularly that this hasn't happened, and to restore the surfaces if it has.

DirectX return values and error-checking

All DirectX functions return an HRESULT as an error-code. Since DirectX objects are based on the COM architecture, the correct way to check if a DirectX function has failed is to use the macros SUCCEEDED() and FAILED(), with the HRESULT as the parameter. It is not merely sufficient to check if, for example, your DirectDraw HRESULT is equal to DD_OK, since it is possible for COM objects to have multiple return values as success values. Your code will probably still work, but technically it is the wrong thing to do.

Something to be on the lookout for, is that some DirectX functions return failure codes when they succeed. For example, IDirectPlay::GetPlayerData will "fail" with DPERR_BUFFERTOOSMALL when you are merely asking for the data size. This behaviour isn't documented either, which is incredibly frustrating. There aren't many of these, but be on the lookout.)

DirectX debugging

When you install the DirectX SDK you get a choice of whether to install the retail version of the libraries, or the debug version. The debug version will actually write diagnostic OutputDebugString messages to your debugger. This can be very useful. However, it slows things down a LOT - if you have anything less than a Pentium 166, rather choose the release libraries. Also, if you want to mainly play DirectX games, install the retail version. If you want to do mainly DirectX development, and your computer is quite fast, install the debug version. If you want to do both, then you should probably use the retail libraries, unless you have a very fast computer that can handle it. I normally install the retail version, but the debug versions can probably be quite useful for people starting out.

CH 3: A simple DirectDraw sample

This is a very simple DirectDraw sample. The source code for this sample is included Here.

Setting up DirectX under Visual C/C++

I most likely won't be doing DirectX development under Watcom or Borland C/C++ or Delphi or VisualBasic etc; so if you want such info included here, you'll have to send it to me.

Firstly, the directories must be set up so that Visual C/C++ can find the DirectX include files and libraries:

1. Access the Tools/Options/Directories tabbed dialog.

2. Select "library directories" from the drop-down list, and add the directory of the DX SDK libraries, e.g. "d:\dxsdk\sdk\lib"

3. Select "include directories" from the drop-down list, and add the directory of the DX SDK header files, e.g. "d:\dxsdk\sdk\inc".

4. If you are going to be using some of the DX utility headers used in the samples, then also add the samples\misc directory, e.g. "d:\dxsdk\sdk\samples\misc" to your includes path.

Note that the version of DirectX that normally ships with Visual C++ isn't usually the latest, so to make sure that the compiler doesn't find the older version located in its own directories, add the include and library paths for the SDK in front of the default include and library paths.

You must also for each application that uses DirectX explicitly add the required libraries to the project. Do this in Project/Settings (Alt+F7), under the "Link" tab for each configuration of your project. For DirectDraw, add ddraw.lib in the Object/Library modules box. You also need to add dxguid.lib here if your application uses any of the DirectX COM interface ID's, eg IID_IDirectDraw7

The DirectDraw sample

Here is a screenshot of the application:

[Screenshot 1]

The general outline of our sample DirectDraw application is as follows:

1. Create a normal Windows window

2. Set up our DirectX variables

3. Initialize a DirectDraw object

4. Set the "cooperative level" and display modes as necessary (explained later)

5. Create front and back surfaces

6. If in windowed mode, create and attach a clipper

7. Render to the back buffer

8. Perform the flipping. If in full-screen mode, just flip. If in windowed mode, you need to blit from the back surface to the primary surface each frame.

9. Repeat from step 7 until we exit

10. Clean up

Setting up

We are going to need a number of variables for our DirectDraw application. These can be global variables or class members, thats up to you. The same goes for functions. Here are the variables we're going to use:

LPDIRECTDRAW        g_pDD;         // DirectDraw object
LPDIRECTDRAWSURFACE g_pDDSPrimary; // DirectDraw primary surface
LPDIRECTDRAWSURFACE g_pDDSBack;    // DirectDraw back surface
LPDIRECTDRAWCLIPPER g_pClipper;    // Clipper for windowed mode
HWND                g_hWnd;        // Handle of window
bool                g_bFullScreen; // are we in fullscreen mode?

All of these variables and functions I place in a seperate file, which can be called anything you want, although you should not use file names that already exist, such as "ddraw.h". This compiler is likely to get confused about which one you want. I've used dd.h and dd.cpp in the sample.

Remember to ensure that these variables are initialized to NULL before we begin. If you were creating classes, you could do this in the constructor of the class.

Here is the general layout of my dd.h and dd.cpp files:

dd.h

#ifndef _DD_H_
#define _DD_H_
 
#include 
 
extern LPDIRECTDRAW g_pDD;
...
 
extern void DirectXFunction();
...
 
#endif

dd.cpp

#include "stdafx.h"
#include "dd.h"
#include 
 
LPDIRECTDRAW g_pDD=NULL;
...
 
void DirectXFunction()
{
         g_pDD = NULL;
...
}

DirectDraw error checking

Before we begin, we should define a "clean" way of checking and debugging error codes from DirectX functions.

We create some functions to help us return and report error strings from HRESULT error codes.

A function that returns a string with the name of an HRESULT code:

char *DDErrorString(HRESULT hr)
{
         switch (hr)
         {
         case DDERR_ALREADYINITIALIZED:           return "DDERR_ALREADYINITIALIZED";
         case DDERR_CANNOTATTACHSURFACE:          return "DDERR_CANNOTATTACHSURFACE";
         case DDERR_CANNOTDETACHSURFACE:          return "DDERR_CANNOTDETACHSURFACE";
         case DDERR_CURRENTLYNOTAVAIL:            return "DDERR_CURRENTLYNOTAVAIL";
         case DDERR_EXCEPTION:                    return "DDERR_EXCEPTION";
         case DDERR_GENERIC:                      return "DDERR_GENERIC";
         case DDERR_HEIGHTALIGN:                  return "DDERR_HEIGHTALIGN";
         case DDERR_INCOMPATIBLEPRIMARY:          return "DDERR_INCOMPATIBLEPRIMARY";
         case DDERR_INVALIDCAPS:                  return "DDERR_INVALIDCAPS";
         case DDERR_INVALIDCLIPLIST:              return "DDERR_INVALIDCLIPLIST";
         case DDERR_INVALIDMODE:                  return "DDERR_INVALIDMODE";
         case DDERR_INVALIDOBJECT:                return "DDERR_INVALIDOBJECT";
         case DDERR_INVALIDPARAMS:                return "DDERR_INVALIDPARAMS";
         case DDERR_INVALIDPIXELFORMAT:           return "DDERR_INVALIDPIXELFORMAT";
         case DDERR_INVALIDRECT:                  return "DDERR_INVALIDRECT";
         case DDERR_LOCKEDSURFACES:               return "DDERR_LOCKEDSURFACES";
         case DDERR_NO3D:                         return "DDERR_NO3D";
         case DDERR_NOALPHAHW:                    return "DDERR_NOALPHAHW";
         case DDERR_NOCLIPLIST:                   return "DDERR_NOCLIPLIST";
         case DDERR_NOCOLORCONVHW:                return "DDERR_NOCOLORCONVHW";
         case DDERR_NOCOOPERATIVELEVELSET:        return "DDERR_NOCOOPERATIVELEVELSET";
         case DDERR_NOCOLORKEY:                   return "DDERR_NOCOLORKEY";
         case DDERR_NOCOLORKEYHW:                 return "DDERR_NOCOLORKEYHW";
         case DDERR_NODIRECTDRAWSUPPORT:          return "DDERR_NODIRECTDRAWSUPPORT";
         case DDERR_NOEXCLUSIVEMODE:              return "DDERR_NOEXCLUSIVEMODE";
         case DDERR_NOFLIPHW:                     return "DDERR_NOFLIPHW";
         case DDERR_NOGDI:                        return "DDERR_NOGDI";
         case DDERR_NOMIRRORHW:                   return "DDERR_NOMIRRORHW";
         case DDERR_NOTFOUND:                     return "DDERR_NOTFOUND";
         case DDERR_NOOVERLAYHW:                  return "DDERR_NOOVERLAYHW";
         case DDERR_NORASTEROPHW:                 return "DDERR_NORASTEROPHW";
         case DDERR_NOROTATIONHW:                 return "DDERR_NOROTATIONHW";
         case DDERR_NOSTRETCHHW:                  return "DDERR_NOSTRETCHHW";
         case DDERR_NOT4BITCOLOR:                 return "DDERR_NOT4BITCOLOR";
         case DDERR_NOT4BITCOLORINDEX:            return "DDERR_NOT4BITCOLORINDEX";
         case DDERR_NOT8BITCOLOR:                 return "DDERR_NOT8BITCOLOR";
         case DDERR_NOTEXTUREHW:                  return "DDERR_NOTEXTUREHW";
         case DDERR_NOVSYNCHW:                    return "DDERR_NOVSYNCHW";
         case DDERR_NOZBUFFERHW:                  return "DDERR_NOZBUFFERHW";
         case DDERR_NOZOVERLAYHW:                 return "DDERR_NOZOVERLAYHW";
         case DDERR_OUTOFCAPS:                    return "DDERR_OUTOFCAPS";
         case DDERR_OUTOFMEMORY:                  return "DDERR_OUTOFMEMORY";
         case DDERR_OUTOFVIDEOMEMORY:             return "DDERR_OUTOFVIDEOMEMORY";
         case DDERR_OVERLAYCANTCLIP:              return "DDERR_OVERLAYCANTCLIP";
         case DDERR_OVERLAYCOLORKEYONLYONEACTIVE: return "DDERR_OVERLAYCOLORKEYONLYONEACTIVE";
         case DDERR_PALETTEBUSY:                  return "DDERR_PALETTEBUSY";
         case DDERR_COLORKEYNOTSET:               return "DDERR_COLORKEYNOTSET";
         case DDERR_SURFACEALREADYATTACHED:       return "DDERR_SURFACEALREADYATTACHED";
         case DDERR_SURFACEALREADYDEPENDENT:      return "DDERR_SURFACEALREADYDEPENDENT";
         case DDERR_SURFACEBUSY:                  return "DDERR_SURFACEBUSY";
         case DDERR_CANTLOCKSURFACE:              return "DDERR_CANTLOCKSURFACE";
         case DDERR_SURFACEISOBSCURED:            return "DDERR_SURFACEISOBSCURED";
         case DDERR_SURFACELOST:                  return "DDERR_SURFACELOST";
         case DDERR_SURFACENOTATTACHED:           return "DDERR_SURFACENOTATTACHED";
         case DDERR_TOOBIGHEIGHT:                 return "DDERR_TOOBIGHEIGHT";
         case DDERR_TOOBIGSIZE:                   return "DDERR_TOOBIGSIZE";
         case DDERR_TOOBIGWIDTH:                  return "DDERR_TOOBIGWIDTH";
         case DDERR_UNSUPPORTED:                  return "DDERR_UNSUPPORTED";
         case DDERR_UNSUPPORTEDFORMAT:            return "DDERR_UNSUPPORTEDFORMAT";
         case DDERR_UNSUPPORTEDMASK:              return "DDERR_UNSUPPORTEDMASK";
         case DDERR_VERTICALBLANKINPROGRESS:      return "DDERR_VERTICALBLANKINPROGRESS";
         case DDERR_WASSTILLDRAWING:              return "DDERR_WASSTILLDRAWING";
         case DDERR_XALIGN:                       return "DDERR_XALIGN";
         case DDERR_INVALIDDIRECTDRAWGUID:        return "DDERR_INVALIDDIRECTDRAWGUID";
         case DDERR_DIRECTDRAWALREADYCREATED:     return "DDERR_DIRECTDRAWALREADYCREATED";
         case DDERR_NODIRECTDRAWHW:               return "DDERR_NODIRECTDRAWHW";
         case DDERR_PRIMARYSURFACEALREADYEXISTS:  return "DDERR_PRIMARYSURFACEALREADYEXISTS";
         case DDERR_NOEMULATION:                  return "DDERR_NOEMULATION";
         case DDERR_REGIONTOOSMALL:               return "DDERR_REGIONTOOSMALL";
         case DDERR_CLIPPERISUSINGHWND:           return "DDERR_CLIPPERISUSINGHWND";
         case DDERR_NOCLIPPERATTACHED:            return "DDERR_NOCLIPPERATTACHED";
         case DDERR_NOHWND:                       return "DDERR_NOHWND";
         case DDERR_HWNDSUBCLASSED:               return "DDERR_HWNDSUBCLASSED";
         case DDERR_HWNDALREADYSET:               return "DDERR_HWNDALREADYSET";
         case DDERR_NOPALETTEATTACHED:            return "DDERR_NOPALETTEATTACHED";
         case DDERR_NOPALETTEHW:                  return "DDERR_NOPALETTEHW";
         case DDERR_BLTFASTCANTCLIP:              return "DDERR_BLTFASTCANTCLIP";
         case DDERR_NOBLTHW:                      return "DDERR_NOBLTHW";
         case DDERR_NODDROPSHW:                   return "DDERR_NODDROPSHW";
         case DDERR_OVERLAYNOTVISIBLE:            return "DDERR_OVERLAYNOTVISIBLE";
         case DDERR_NOOVERLAYDEST:                return "DDERR_NOOVERLAYDEST";
         case DDERR_INVALIDPOSITION:              return "DDERR_INVALIDPOSITION";
         case DDERR_NOTAOVERLAYSURFACE:           return "DDERR_NOTAOVERLAYSURFACE";
         case DDERR_EXCLUSIVEMODEALREADYSET:      return "DDERR_EXCLUSIVEMODEALREADYSET";
         case DDERR_NOTFLIPPABLE:                 return "DDERR_NOTFLIPPABLE";
         case DDERR_CANTDUPLICATE:                return "DDERR_CANTDUPLICATE";
         case DDERR_NOTLOCKED:                    return "DDERR_NOTLOCKED";
         case DDERR_CANTCREATEDC:                 return "DDERR_CANTCREATEDC";
         case DDERR_NODC:                         return "DDERR_NODC";
         case DDERR_WRONGMODE:                    return "DDERR_WRONGMODE";
         case DDERR_IMPLICITLYCREATED:            return "DDERR_IMPLICITLYCREATED";
         case DDERR_NOTPALETTIZED:                return "DDERR_NOTPALETTIZED";
         case DDERR_UNSUPPORTEDMODE:              return "DDERR_UNSUPPORTEDMODE";
         case DDERR_NOMIPMAPHW:                   return "DDERR_NOMIPMAPHW";
         case DDERR_INVALIDSURFACETYPE:           return "DDERR_INVALIDSURFACETYPE";
         case DDERR_DCALREADYCREATED:             return "DDERR_DCALREADYCREATED";
         case DDERR_CANTPAGELOCK:                 return "DDERR_CANTPAGELOCK";
         case DDERR_CANTPAGEUNLOCK:               return "DDERR_CANTPAGEUNLOCK";
         case DDERR_NOTPAGELOCKED:                return "DDERR_NOTPAGELOCKED";
         case DDERR_NOTINITIALIZED:               return "DDERR_NOTINITIALIZED";
         }
         return "Unknown Error";
}

A function that we can use in our code to help us check for errors. It checks if an HRESULT is a failure, and if it is, it prints a debugging message and returns true, otherwise it returns false.

bool DDFailedCheck(HRESULT hr, char *szMessage)
{
         if (FAILED(hr))
         {
                 char buf[1024];
                 sprintf( buf, "%s (%s)\n", szMessage, DDErrorString(hr) );
                 OutputDebugString( buf );
                 return true;
         }
         return false;
}

Some lazy coders think that they can get away without doing much error checking. With DirectX, this is a very bad idea. You will have errors.

Initializing the DirectDraw system

After having created a Windows window (using MFC or plain Win32), we initialize the DirectDraw system, by creating an "IDirectDraw" object.

The DirectDrawCreate or DirectDrawCreateEx function calls can be used to create a DirectDraw object. You only create a single DirectDraw object for your application

bool DDInit( HWND hWnd )
{
         HRESULT hr;
 
         g_hWnd = hWnd;
 
         // Initialize DirectDraw
         hr = DirectDrawCreate( NULL, &g_pDD, NULL );
         if (DDFailedCheck(hr, "DirectDrawCreate failed" ))
                 return false;
 
         return true;
}

Note that DirectDrawCreate will create an "old" DirectDraw that does not support the functions that "new" DirectDraw interfaces (such as an IDirectDraw7) does. Use DirectDrawCreateEx to create a DirectDraw interface that does. For our simple sample the above is sufficient.

Setting the screen mode

The remaining DirectDraw initialization (setting modes, creating surfaces and clippers) I place in a single function called CreateSurfaces.

The function SetCooperativeLevel is used to tell the system whether or not we want to use full-screen mode or windowed mode. In full-screen mode, we have to get exclusive access to the DirectDraw device, and then set the display mode. For windowed mode, we set the cooperative level to normal.

bool DDCreateSurfaces( bool bFullScreen)
{
         HRESULT hr; // Holds return values for DirectX function calls
 
         g_bFullScreen = bFullScreen;
 
         // If we want to be in full-screen mode
         if (g_bFullScreen)
         {
                 // Set the "cooperative level" so we can use full-screen mode
                 hr = g_pDD->SetCooperativeLevel(g_hWnd, DDSCL_EXCLUSIVE|DDSCL_FULLSCREEN|DDSCL_NOWINDOWCHANGES);
                 if (DDFailedCheck(hr, "SetCooperativeLevel"))
                          return false;
 
                 // Set 640x480x256 full-screen mode
                 hr = g_pDD->SetDisplayMode(640, 480, 8);
                 if (DDFailedCheck(hr, "SetDisplayMode" ))
                          return false;
         }
         else
         {
                 // Set DDSCL_NORMAL to use windowed mode
                 hr = g_pDD->SetCooperativeLevel(g_hWnd, DDSCL_NORMAL);
                 if (DDFailedCheck(hr, "SetCooperativeLevel windowed" ))
                          return false;
         }
 
         ...

Creating surfaces

OK ... now that we've got that bit of initialization out of the way, we need to create a flipping structure. No, I'm not cursing the structure .. "flipping" as in screen page-flipping :).

Anyway, we need to create one main surface that everyone will see, and a "back" surface. All drawing is done to the back surface. When we are finished drawing we need to make what we've drawn visible. In full-screen mode, we just need to call a routine called Flip, which will turn the current back surface into the primary surface and vice versa. In windowed mode, we don't actually flip the surfaces - we copy the contents of the back buffer onto the primary buffer, which is what's inside the window. In other words, we "blit" the back surface onto the primary surface.

Anyway, here is the bit of code to create the surfaces. Right now the code is ignoring full-screen mode and only catering for windowed mode, but that'll change. Also, if there are errors in this code, consider them "exercises" ... :).

         ...
 
         DDSURFACEDESC ddsd; // A structure to describe the surfaces we want
         // Clear all members of the structure to 0
         memset(&ddsd, 0, sizeof(ddsd));
         // The first parameter of the structure must contain the size of the structure
         ddsd.dwSize = sizeof(ddsd);
 
         if (g_bFullScreen)
         {
                 // Screw the full-screen mode (for now) (FIXME)
         }
         else
         {
 
                 //-- Create the primary surface
 
                 // The dwFlags paramater tell DirectDraw which DDSURFACEDESC
                 // fields will contain valid values
                 ddsd.dwFlags = DDSD_CAPS;
                 ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE;
 
                 hr = g_pDD->CreateSurface(&ddsd, &g_pDDS, NULL);
                 if (DDFailedCheck(hr, "Create primary surface"))
                          return false;
 
                 //-- Create the back buffer
 
                 ddsd.dwFlags = DDSD_WIDTH | DDSD_HEIGHT | DDSD_CAPS;
                 // Make our off-screen surface 320x240
                 ddsd.dwWidth = 320;
                 ddsd.dwHeight = 240;
                 // Create an offscreen surface
                 ddsd.ddsCaps.dwCaps = DDSCAPS_OFFSCREENPLAIN;
 
                 hr = g_pDD->CreateSurface(&ddsd, &g_pDDSBack, NULL);
                 if (DDFailedCheck(hr, "Create back surface"))
                          return false;
 
         }
 
         ...

Creating the Clipper

Now that we've created the surfaces, we need to create a clipper (if we're running in windowed mode), and attach the clipper to the primary surface. This prevents DirectDraw from drawing outside the windows client area.

         ...
 
         //-- Create a clipper for the primary surface in windowed mode
         if (!g_bFullScreen)
         {
 
                 // Create the clipper using the DirectDraw object
                 hr = g_pDD->CreateClipper(0, &g_pClipper, NULL);
                 if (DDFailedCheck(hr, "Create clipper"))
                          return false;
 
                 // Assign your window's HWND to the clipper
                 hr = g_pClipper->SetHWnd(0, g_hWnd);
                 if (DDFailedCheck(hr, "Assign hWnd to clipper"))
                          return false;
 
                 // Attach the clipper to the primary surface
                 hr = g_pDDS->SetClipper(g_pClipper);
                 if (DDFailedCheck(hr, "Set clipper"))
                          return false;
         }
 
         ...

Putting it all together

Now that we have all these initialization routines, we need to actually call them, so the question is, where to call them?

In an MFC application, a logical place to do this is in the application's InitInstance routine:

BOOL CYourAppNameHereApp::InitInstance()
{
         ... All the other MFC initialization junk here ..
 
         // Initialize DirectDraw
         if (!DDInit( AfxGetMainWnd()->GetSafeHwnd() ))
         {
                 AfxMessageBox( "Failed to initialize DirectDraw" );
                 return FALSE;
         }
 
         // Create DirectDraw surfaces
         if (!DDCreateSurfaces( false ))
         {
                 AfxMessageBox( "Failed to create surfaces" );
                 return FALSE;
         }
 
         return TRUE;
}

In a plain Win32 application, you can do this in your WinMain function just before you enter the main message loop, but after you've created your window:

int APIENTRY WinMain(HINSTANCE hInstance,
                     HINSTANCE hPrevInstance,
                     LPSTR     lpCmdLine,
                     int       nCmdShow)
{
         MSG  Msg;
 
         g_hInstance = hInstance;
 
         if (!hPrevInstance) {
                 if (!Register( g_hInstance ))
                          return FALSE;
         }
 
         // Create the main window
         g_hwndMain = Create( nCmdShow, 320, 240 );
         if (!g_hwndMain)
                 return FALSE;
 
         // Initialize DirectDraw
         if (!DDInit( g_hwndMain ))
         {
                 MessageBox( g_hwndMain, "Failed to initialize DirectDraw", "Error", MB_OK );
                 return 0;
         }
 
         // Create DirectDraw surfaces
         if (!DDCreateSurfaces( false ))
         {
                 MessageBox( g_hwndMain, "Failed to create surfaces", "Error", MB_OK );
                 return 0;
         }
 
         while (GetMessage(&Msg, NULL, 0, 0))
         {
                 TranslateMessage(&Msg);
                 DispatchMessage(&Msg);
         }
 
         return Msg.wParam;
}

Restoring lost surfaces

As if all this initialization wasn't enough, we also have to make sure our DirectDraw surfaces are not getting "lost". The memory associated with DirectDraw surfaces can be released under certain circumstances, because it has to share resources with the Windows GDI. So each time we render, we first have to check if our surfaces have been lost and Restore them if they have. This is accomplished with the IsLost function.

void CheckSurfaces()
{
         // Check the primary surface
         if (g_pDDS)
         {
                 if (g_pDDS->IsLost() == DDERR_SURFACELOST)
                          g_pDDS->Restore();
         }
         // Check the back buffer
         if (g_pDDSBack)
         {
                 if (g_pDDSBack->IsLost() == DDERR_SURFACELOST)
                          g_pDDSBack->Restore();
         }
}

The rendering loop

Now that we've got most of the general initialization out of the way, we need to set up a rendering loop. This is basically the main loop of the game, the so-called HeartBeat function. So we're going to call it just that.

The HeartBeat function gets called during your applications idle-time processing, which is typically whenever the window has no more messages to process.

MFC: We can override the application's OnIdle function and call our HeartBeat function from there. Use ClassWizard or the toolbar wizard to create a handler for "idle-time processing" for your main application class.

BOOL CYourMFCAppNameHereApp::OnIdle(LONG lCount)
{
         CWinApp::OnIdle(lCount); // Call the parent default OnIdle handler
 
         // Our game's heartbeat function
         HeartBeat();
 
         // Request more idle-time, so that we can render the next loop!
         return TRUE;
}

Win32: We can call the heartbeat function from inside the message loop, by using the function PeekMessage in our WinMain function to determine if we have any messages waiting:

         g_bRunning = true;
         while (g_bRunning)
         {
                 while (PeekMessage(&Msg, g_hwndMain, 0, 0, PM_NOREMOVE))
                 {
                          BOOL bGetResult = GetMessage(&Msg, NULL, 0, 0);
                          TranslateMessage(&Msg);
                          DispatchMessage(&Msg);
                          if (bGetResult==0)
                                   g_bRunning = false;
                 }
                 if (g_bRunning)
                 {
                          CheckSurfaces();
                          HeartBeat();
                 }
         }

There are alternate ways to decide when to call the HeartBeat function, for example you could use a timer. The method you use depends on the type of game you are making. If you are making a first-person 3D shooter, you probably want as high a frame rate as possible, so you might use the idle-time method. If you are making a 2D scrolling game, this might not be optimal, as you may want to control the frame rate.

The HeartBeat function

Now let's look at the heartbeat function. The function checks for lost surfaces, then clears the back buffer with black, then draws a color square to the back buffer, and then flips the back buffer to the front.

void HeartBeat()
{
         // Check for lost surfaces
         CheckSurfaces();
 
         // Clear the back buffer
         DDClear( g_pDDSBack, 0, 0, 320, 240 );
 
         static int iFoo = 0;
         // Draw a weird looking color square
         for ( int r=0; r<64;>
         {
                 for ( int g=0; g<64;>
                 {
                          DDPutPixel( g_pDDSBack, g, r, (r*2+iFoo)%256, (g+iFoo)%256, (63-g)*4 );
                 }
         }
         iFoo++;
 
         // Blit the back buffer to the front buffer
         DDFlip();

The DDPutPixel function used here is already explained.

Flipping surfaces

Now let's look at the function that performs the surface flipping.

void DDFlip()
{
         HRESULT hr;
 
         // if we're windowed do the blit, else just Flip
         if (!g_bFullScreen)
         {
                 RECT    rcSrc;  // source blit rectangle
                 RECT    rcDest; // destination blit rectangle
                 POINT   p;
 
                 // find out where on the primary surface our window lives
                 p.x = 0; p.y = 0;
                 ::ClientToScreen(g_hWnd, &p);
                 ::GetClientRect(g_hWnd, &rcDest);
                 OffsetRect(&rcDest, p.x, p.y);
                 SetRect(&rcSrc, 0, 0, 320, 240);
                 hr = g_pDDS->Blt(&rcDest, g_pDDSBack, &rcSrc, DDBLT_WAIT, NULL);
         }
         else
         {
                 hr = g_pDDS->Flip(NULL, DDFLIP_WAIT);
         }
}

A primary surface in windowed mode represents the entire Windows screen, so we have to first find out where on the screen our window is, and then translate by that offset in order to blit into the Window.

Note the Blt parameter DDBLT_WAIT. By default, if a surface is "busy" when you call Blt (for example if the GDI is accessing it) then DirectDraw will return an error, without performing the blit. Passing the DDBLT_WAIT option will instruct DirectDraw to wait until the surface becomes available and then perform the blit.

Cleaning up

When we're done with DirectX objects, we have to "release" them, which is done by calling Release on them, for example:

void DDDone()
{
         if (g_pDD != NULL)
         {
                 g_pDD->Release();
                 g_pDD = NULL;
         }
}

Sample TODO

There are a few things the sample can't do yet. For one thing, full-screen mode doesn't work properly yet. It should also demonstrate how to handle switching between windowed and full-screen modes.

CH 4: A simple Direct3D Retained mode sample

Direct3D: An Overview

Over here I'll shove in some basics, like coordinate systems, world and object coordinate systems, etc. For now I'll assume you're at least a little familiar with 3D programming. Blah blah blah, differences between immediate and retained mode, etc etc.

Devices

Direct3D interfaces with the surface it is rendering to (e.g. screen memory, system memory) using an IDirect3DRMDevice object. More than one type of rendering device can exist and a specific rendering device must be chosen for a scene. For example, there is normally a device for RGB rendering and a device for Mono rendering (these names refer to the lighting model used for rendering. Mono means that only white lights can exist in the scene, while RGB supports colored lights, and is thus slower). Additional devices may be installed that make use of 3D hardware acceleration. It is possible to iterate through the installed D3D devices by enumarating through them (EnumDevices). It is possible to have two different devices rendering to the same surface.

Viewports

The IDirect3DRMViewport object is used to keep track of how our 3D scene is rendered onto the device. It is possible to have multiple viewports per device, and it is also possible to have a viewport rendering to more than one device. The viewport object keeps track of the camera, front and back clipping fields, field of view etc.

Frames

A frame in Direct3D is basically used to store an object's position and orientation information, relative to a given frame of reference, which is where the term frame comes from. Frames are positioned relative to other frames, or to the world coordinates. Frames are used to store the positions of objects in the scene as well as other things like lights. OK, so I'm explaining it badly. It's late, I'm tired, I'll revise it soon. To add an object to the scene we have to attach the object to a frame. The object is called a visual in Direct3D, since it represents what the user sees. So, a visual has no meaningful position or orientation information itself, but when attached to a frame, it is transformed when rendered according to the transformation information in the frame. Multiple frames may use the same visual. This can save a lot of time and memory in a situation like, for example, a forest or a small fleet of spacecraft, where you have a bunch of objects that look exactly the same but all exist in different positions and orientations.

Here is a crummy ASCII diagram of a single visual attached to two frames which are at different positions:

   _____
  /    /| <- Cube (visual)
 /    / |<==========================>[Frame1: (21, 3, 4)]
+----+  |
|    | /<===========================>[Frame2: (-12, 10, -6)]
|    |/
+----+

If both of these frames were attached to the scene frame, then our scene would have 2 cubes in it; one at (21, 3, 4) and the other at (-12, 10, -6).

The Direct3D RM Sample

Firstly, heres a screenshot of the small simple sample application we're putting together here.

[Screenshot 2]

Setting up global variables

Before we start we'll need a few global variables.

 
LPDIRECTDRAW pDD;                // A DirectDraw object
LPDIRECT3DRM pD3DRM;             // A Direct3D RM object
LPDIRECTDRAWSURFACE pDDSPrimary; // DirectDraw primary surface
LPDIRECTDRAWSURFACE pDDSBack;    // DirectDraw back surface
LPDIRECTDRAWPALETTE pDDPal;      // Palette for primary surface
LPDIRECTDRAWCLIPPER pClipper;    // Clipper for windowed mode
LPDIRECT3DRMDEVICE pD3DRMDevice; // A device
LPDIRECT3DRMVIEWPORT pViewport;  // A viewport
LPDIRECT3DRMFRAME pCamera;       // A camera
LPDIRECT3DRMFRAME pScene;        // The scene
LPDIRECT3DRMFRAME pCube;         // The one and only object in
                                 // our scene
BOOL bFullScreen;                // Are we in full-screen mode?
BOOL bAnimating;                 // Has our animating begun?
HWND ddWnd;                      // HWND of the DDraw window

Note that we need both a DirectDraw object and a Direct3D object to create a Direct3D application. This is because Direct3D works in conjunction with DirectDraw. As before, we need a primary and a back surface for our double-buffering, and a clipper to handle window-clipping in windowed mode. The palette object is still not discusses in this tutorial (yet). We have objects for the device and viewport, and we have frame objects to keep track of the scene and the scene's camera. Also, we have a frame that is used for the object we'll have in this scene.

Here is a routine just to initially flatten these globals:

 
void InitDirectXGlobals()
{
    pDD = NULL;
    pD3DRM = NULL;
    pDDSPrimary = NULL;
    pDDSBack = NULL;
    pDDPal = NULL;
    pClipper = NULL;
    pD3DRMDevice = NULL;
    pViewport = NULL;
    pCamera = NULL;
    pScene = NULL;
    pCube = NULL;
 
    bFullScreen = FALSE;
    bAnimating = FALSE;
}

From 'Initializing the DirectDraw system' to 'Creating the clipper'

These steps all proceed exactly as in the DirectDraw sample, with the exception of the CreateSurface function, where the back surface has to created with the DDSCAPS_3DDEVICE, since it will be used for 3d rendering:

 
UINT CreatePrimarySurface()
{
    .
    .
    .
    // Create an offscreen surface, specifying 3d device
    ddsd.ddsCaps.dwCaps = DDSCAPS_OFFSCREENPLAIN | DDSCAPS_3DDEVICE;
    .
    .
    .
}

Creating the Direct3D Retained Mode object

Now we need to create an IDirect3DRM object. This is achieved, quite simply, by calling the Direct3DRMCreate function.

 
UINT CreateDirect3DRM()
{
    HRESULT hr;
    // Create the IDirect3DRM object.
    hr = Direct3DRMCreate(&pD3DRM);
    if (FAILED(hr)) {
        TRACE("Error creating Direct3d RM object\n");
        return 1;
    }
    return 0;
}

Creating the device for rendering

We create the device object from the back surface, since this surface is the one we will render to.

 
UINT CreateDevice()
{
    HRESULT hr;
    hr = pD3DRM->CreateDeviceFromSurface(
        NULL, pDD, pDDSBack, &pD3DRMDevice);
    if (FAILED(hr)) {
        TRACE("Error %d creating d3drm device\n", int(LOWORD(hr)));
        return 1;
    }
    // success
    return 0;
}

Creating the viewport

We do a bit more than just create the viewport here. We create the scene object and the camera object, as well as set the ambient light for the scene, and create a directional light.

 
UINT CreateViewport()
{
    HRESULT hr;
 
    // First create the scene frame
    hr = pD3DRM->CreateFrame(NULL, &pScene);
    if (FAILED(hr)) {
        TRACE("Error creating the scene frame\n");
        return 1;
    }
 
    // Next, create the camera as a child of the scene
    hr = pD3DRM->CreateFrame(pScene, &pCamera);
    if (FAILED(hr)) {
        TRACE("Error creating the scene frame\n");
        return 2;
    }
    // Set the camera to lie somewhere on the negative z-axis, and
    // point towards the origin
    pCamera->SetPosition(
        pScene, D3DVAL(0.0), D3DVAL(0.0), D3DVAL(-300.0));
    pCamera->SetOrientation(
        pScene,
        D3DVAL(0.0), D3DVAL(0.0), D3DVAL(1.0),
        D3DVAL(0.0), D3DVAL(1.0), D3DVAL(0.0));
 
    // create lights
    LPDIRECT3DRMLIGHT pLightAmbient = NULL;
    LPDIRECT3DRMLIGHT pLightDirectional = NULL;
    LPDIRECT3DRMFRAME pLights = NULL;
 
    // Create two lights and a frame to attach them to
    // I haven't quite figured out the CreateLight's second
    // parameter yet.
    pD3DRM->CreateFrame(pScene, &pLights);
    pD3DRM->CreateLight(D3DRMLIGHT_AMBIENT, pD3DRMCreateColorRGB(
        D3DVALUE(0.3), D3DVALUE(0.3), D3DVALUE(0.3)),
        &pLightAmbient);
    pD3DRM->CreateLight(D3DRMLIGHT_DIRECTIONAL, D3DRMCreateColorRGB(
        D3DVALUE(0.8), D3DVALUE(0.8), D3DVALUE(0.8)),
        &pLightDirectional);
 
    // Orient the directional light
    pLights->SetOrientation(pScene,
        D3DVALUE(30.0), D3DVALUE(-20.0), D3DVALUE(50.0),
        D3DVALUE(0.0), D3DVALUE(1.0), D3DVALUE(0.0));
 
    // Add ambient light to the scene, and the directional light
    // to the pLights frame
    pScene->AddLight(pLightAmbient);
    pLights->AddLight(pLightDirectional);
 
    // Create the viewport on the device
    hr = pD3DRM->CreateViewport(pD3DRMDevice,
        pCamera, 10, 10, 300, 220, &pViewport);
    if (FAILED(hr)) {
        TRACE("Error creating viewport\n");
        return 3;
    }
    // set the back clipping field
    hr = pViewport->SetBack(D3DVAL(5000.0));
 
    // Release the temporary lights created. It seems
    // they will have been copied for the scene during AddLight
    pLightAmbient->Release();
    pLightDirectional->Release();
 
    // success
    return 0;
}

Putting it all together

Here is the tail-end of the app's InitInstance function:

 
    InitDirectXGlobals();
    TRACE("Calling InitDDraw\n");
    InitDDraw();
    SetMode();
//    TRACE("Calling LoadJascPalette\n");
//    LoadJascPalette("inspect.pal", 10, 240);
    TRACE("Calling CreatePrimarySurface\n");
    CreatePrimarySurface();
    TRACE("Calling CreateClipper\n");
    CreateClipper();
//    TRACE("Calling AttachPalette\n");
//    AttachPalette(pDDPal);
    TRACE("Calling CreateDirect3DRM\n");
    CreateDirect3DRM();
    TRACE("Calling CreateDevice\n");
    CreateDevice();
    TRACE("Calling CreateViewport\n");
    CreateViewport();
    TRACE("Calling CreateDefaultScene\n");
    CreateDefaultScene();
 
    bAnimating = TRUE;
 
    return TRUE;
}

Restoring lost surfaces

Same as the DirectDraw sample:

 
BOOL CheckSurfaces()
{
    // Check the primary surface
    if (pDDSPrimary) {
        if (pDDSPrimary->IsLost() == DDERR_SURFACELOST) {
            pDDSPrimary->Restore();
            return FALSE;
        }
    }
    return TRUE;
}

The Rendering loop

Same as the DirectDraw sample:

 
BOOL CD3dRmAppApp::OnIdle(LONG lCount)
{
    CWinApp::OnIdle(lCount);
    if (bAnimating) {
        HeartBeat();
        Sleep(50);
    }
    return TRUE;
}

The HeartBeat function

 
BOOL CD3dRmAppApp::HeartBeat()
{
    HRESULT hr;
//    if (!CheckSurfaces) bForceUpdate = TRUE;
//    if (bForceUpdate) pViewport->ForceUpdate(10,10,300,220);
    hr = pD3DRM->Tick(D3DVALUE(1.0));
    if (FAILED(hr)) {
        TRACE("Tick error!\n");
        return FALSE;
    }
 
    // Call our routine for flipping the surfaces
    FlipSurfaces();
 
    // No major errors
    return TRUE;
}

No comments:

ITUCU