Saturday, February 18, 2012

Blaise Pascal Magazine Rerun #8: Debugging Multithreaded Applications

This article was originally written for the Blaise Pascal Magazine and was published in Issue #14.

In the first four installments I’ve tried to give an overview of multithreading programming. Nothing more than the basics, though, as the multithreading is a complicated area. If you’ll go the multithreading way, prepare to spend lots of time behind the computer looking for weird hard-to-repeat and even harder-to-find bugs. Your life will be (slightly) better if you know how to effectively use debugging tools in Delphi so let’s take a look at features implemented specifically with multithreading programming in mind. Sadly, most of them are only available in newer Delphi’s (from 2009 onwards).

Debugger enhancements

Let’s start with a very simple demo. One form and one button with the OnClick event assigned.

procedure TForm11.btnDeadlockClick(Sender: TObject);
var
lock: TCriticalSection;
thread: TThread;
begin
lock := TCriticalSection.Create;
lock.Acquire;
thread := TThread.CreateAnonymousThread(
procedure begin
TThread.NameThreadForDebugging('AnonymousThread');
lock.Acquire;
end
);
thread.FreeOnTerminate := false;
thread.Start;
thread.WaitFor;
thread.Terminate;
thread.Free;
Caption := 'OK';
end;

The code first initializes and acquires a critical section. Then it creates a background thread, starts the threads and waits for it to terminate. At the end, form Caption is changed to “OK”.

What about the background thread? It is implemented using “anonymous threads” – a new approach in Delphi XE, similar to AsyncCalls and (parts of) OmniThreadLibrary. CreateAnonymousThread allows you to implement the threaded code in an anonymous function. This thread first changes its name to “AnonymousThread” (we’ll come back to that in a moment) and tries to acquire the lock.

Let’s put aside the threading and locking, look at the code again and ask ourselves – which lock? There’s no lock declared in the anonymous function so this must clearly be the lock variable from the btnDeadlockClick method. If we were to implement this background thread with “classical” approach (TThread-based object), we would have to pass this lock to the threaded code somehow, possibly as a parameter to the constructor. As we’re using the anonymous function, compiler does all that work for us. It “captures” the lock variable and allows the anonymous function to use it.

OK, back to the code. Did you spot the problem already? Lock is acquired in the main thread before the background thread starts execution. Because of that, background thread cannot re-acquire it and will block on the lock.Acquire call. And because the background thread never terminates, thread.WaitFor call never returns. When you run the program and click on the button, background thread will stay blocked in the lock.Acquire and main thread in thread.WaitFor. A classical deadlock.

If you press the “pause” debugger button (two blue vertical lines)  now, the Delphi will stop the program execution and display information on all running threads. (At least if you have the appropriate debugging windows visible. Be sure to enable debug windows (View, Debug Windows) Call Stack and Threads.

image

In the Threads window you’ll see three threads (at least on Delphi XE) – one is the main program thread, one is the background anonymous thread and the third thread was created temporarily by the debugger. You should ignore it. Since Delphi 2009 (and when running on Vista or Windows 7) this window also displays a “wait chain”. In our example we can read directly that the background thread is blocked on critical section, owned by the thread 5500. (BTW, this number will not be the same if you’re repeating the experiment. Thread IDs are randomly allocated and will be different on each execution.) We also know that this is the main thread as it is the first thread listed in that window. Interestingly, the debugger doesn’t know that the main thread is also blocked waiting on the background thread. This is one of the weird features of the wait chain detection (which is, by the way, implemented by the Windows itself, not by Delphi) – sometimes deadlocks are detected, sometimes not.

Before we proceed with the debugging, let’s return shortly to the “AnonymousThread” name. While other threads have only numeric “names”, the background thread is properly named. That’s hardly surprising as we’ve seen that the code calls NameThreadForDebugging and assigns a name to the ThreadID (which is the number visible in the Threads window if the thread doesn’t have a name). The NameThreadForDebugging is fairly new, but you can use the same functionality in older Delphi’s too. For example, the OmniThreadLibrary contains function SetThreadName which does exactly the same.

procedure SetThreadName(const name: string);
type
TThreadNameInfo = record
FType : LongWord; // must be 0x1000
FName : PAnsiChar;// pointer to name (in user address space)
FThreadID: LongWord; // thread ID (-1 indicates caller thread)
FFlags : LongWord; // reserved for future use, must be zero
end; { TThreadNameInfo }
var
ansiName : AnsiString;
threadNameInfo: TThreadNameInfo;
begin
if DebugHook <> 0 then begin
ansiName := AnsiString(name);
threadNameInfo.FType := $1000;
threadNameInfo.FName := PAnsiChar(ansiName);
threadNameInfo.FThreadID := $FFFFFFFF;
threadNameInfo.FFlags := 0;
try
RaiseException($406D1388, 0, SizeOf(threadNameInfo) div SizeOf(LongWord),
@
threadNameInfo);
except {ignore} end;
end;
end;

This function raises a special exception, which is caught by the debugger. Debugger then updates its internal tables and throws away the exception so that the program can continue its execution. As we’ll see later, you can also change thread name “manually” in the debugger, at least in more recent Delphi’s.

Let’s return to the debugger and see what more we can learn. Double/click on the line starting with ‘5500’ in the debugger and it will show the currently executing instruction and the call stack for that thread. At the top of the call stack we see a bunch of not-very-interesting WaitFor instructions which indicate only that the thread is waiting for something. The first interesting item is Classes.TThread.WaitFor. Click it and the debugger will display the WaitFor method and mark the currently executing statement.

image

We can also see that the WaitFor was called by Controls.TControl.Click, a short method in the Controls unit that calls the OnClick event handler. We would also expect the btnDeadlockClick to be visible in the call stack but somehow it is not. No idea why.

Just a short side notice – if you want to see the source for internal Delphi units as Classes in this example, make sure that you build your app with the “Use debug .dcus” setting.

image

Since Delphi 2010 the context menu in the Threads window contains some very useful options. You can suspend (block) threads with Freeze and Freeze All Other Threads and resume (unblock) them with Thaw and Thaw All Threads. This feature can be immensely helpful while debugging while it can also introduce deadlocks (if you're trying to run a thread that is waiting on an synchronisation object owned by a suspended thread). Use them with caution!

Another interesting item here is Name Thread. Use it to temporarily (for the duration of the debugging session) assign a name to a thread. This name will be visible in the Threads window.

image

The last debugger option I want to emphasize are thread-specific breakpoint. Those are breakpoint that will break the execution only when trigger from a specific thread. They were implemented in Delphi 2010 and are incredibly useful when more than one thread is executing the same code.

image

Testing

When you’re writing multithreaded applications a proper approach to testing will (and please note that I’m not using “may” or “can”!) mean a difference between a working and crashing code.

Always write automated stress tests for your multithreaded code. Write a testing app that will run some (changeable) number of threads that will execute your code for some prolonged time and then check the results, status of internal data structures, etc – whatever your multithreaded code is depending upon. Run those tests whenever you change the code. Run them for long time – overnight is good.

Always test multithreaded code on small and large number of threads. Always test your apps with minimum number of required threads (even one, if it makes sense) on only one core and then increase number of threads and cores until your running many more threads than you have cores. I’ve found out that most problems occur when threads are blocked at “interesting” points in the execution and the simplest way to simulate this is to overload the system by running more threads than there are cores.

When you find a problem in the application that the automated test didn’t find, make sure that you first understand how to repeat the problem. Include it in the automated test next and only then start to fix it.

In other words – unit testing is your friend. Use it!

Application design

Most bugs in multithreaded programs spring from too complicated designs. Complicated architecture equals complicated and hard to find (and even harder to fix) problems. Keep it simple!

Instead of inventing your own multithreaded solutions, use as many well-tested tools as possible. More users = more found bugs. Of course, you should make sure that your tools are regularly upgraded and that you’re no using some obsolete code that everybody has run away from.

Keep the interaction points between threads simple, small and well defined. That will reduce the possibility of conflicts and will simplify the creation of automated tests.

Share as little data as possible. Global state (shared data) requires locking and is therefore bad by definition. Message queues will reduce possibility for deadlocking. Still, don’t expect message-based solutions to be magically correct – they can still lead to locking.

And besides everything else – have fun! Multithreaded programming is immensely hard but is also extremely satisfying.

1 comment:

  1. "Always write automated stress tests for your multithreaded code." Heh. Otherwise your customers will be providing the automated tests, and that will make them grumpy.

    I still treasure the email(s) from a past job after some muppet spent two weeks modifying some of my multithreaded code and commenting out test cases that failed. He couldn't get anything to work. The team leader took over and started again. He emailed me to say "I keep making trivial changes and then the tests fail. It is very frustrating". After a few days of that he had the changes made and the tests passing. At which point it worked in production too. Apparently multithreaded code is hard :)

    I would love to see multithreaded extensions to DUnit, but my attempt to write some proved that it's harder than it looks. I ended up with a setup that just collected all the registered tests and ran them in a variable-sized thread pool.

    ReplyDelete