Monday, February 13, 2012

Blaise Pascal Magazine Rerun #4: Introduction to Multithreading

This article was originally written for the Blaise Pascal Magazine and was published in Issue #9.

For the last fifty years, we programmers had it easy. We could write slow, messy, suboptimal code and when a customer complained we would just say: "What? You're still using that old hardware? Throw that junk away, get a new computer and program will fly!" With some luck new hardware would solve the problem and if not we could pretend to fix the problem until new generation of computers came out :) . In other words - Moore's law worked in our favor.

This situation changed radically in the last year. New processors are not significantly faster than the old ones and unless something will drastically change in CPU design and production, that will stay so. Instead of packing more speed, manufacturers are now putting multiple processor units (or cores as they are usually called) inside one CPU. In a way that gives our customers faster computers, but only if they are using multiple programs at once. Our traditionally written programs that can use only one processor unit at any moment won't profit from multiple cores. Even worse, we cannot say "buy a new computer" anymore. Maybe it will have a new, better processor with more cores than the previous one but our program, our precious program that can utilize only one core, will not run any faster.

As we can all see, This Is No Good. We have to do something to make our programs faster on multi-core processors. The only way to do that is to make the program do more than one thing at the same time and the simplest and most effective way to do it is to use multithreading or using the ability of the operating system to execute multiple threads simultaneously. [A note to experienced readers: There's more to threads, threading and multithreading than I will tell. If you want to get a full story, check the Wikipedia, en.wikipedia.org/wiki/Thread_(computer_science). I'll intentionally limit myself to the Windows and Delphi.]

A word of warning - this will be a long journey. Today I'll only scrap the surface and give you an overview of the topic. In next installments we'll start working on multithreaded programs using not only Delphi's native way but also using 3rd party extensions, components and libraries. We'll describe and use messaging, locking, shared data, lock-free structures, and more. Stay tuned!

A process and a thread work together

As a programmer you probably know, at least instinctively, what is a process. In operating system terminology, a process is a rough equivalent of an application - when the user starts an application, operating system creates and starts new process. Process contains (or better, owns) application code, but also all resources that this code uses - memory, file handles, device handles, sockets, windows etc.

When the program is executing, the system must also keep track of the current execution address, state of the CPU registers and state of the program's stack. This information, however, is not part of the process, but belongs to a thread. Even a simplest program uses one thread, which describes the program's execution. In other words, process encapsulates program's static data while thread encapsulates the dynamic part. During the program's lifetime, the thread describes its line of execution - if we know the state of the thread at every moment, we can fully reconstruct the execution in all details.

All operating systems support one thread per process (obviously) but some go further and support multiple threads in one process. Actually, most modern operating systems support multithreading (as this approach is called), the difference is just in details (and for those, see the Wikipedia topic I mentioned earlier). With multithreading, operating system manages multiple execution paths through the same code and those paths may execute at the same time (and then again, they may not - but more on that later).

An important fact is that processes are heavy. It takes a long time (at least at the operating system level where everything is measured in microseconds) to create and load a new process. In contrast to that, threads are light. New thread can be created almost immediately - all the operating system has to do is to allocate some memory for the stack and set up some control structures used by the kernel.

Another important point about processes is that they are isolated. Operating system does its best to separate one process from another so that buggy (or malicious) code in one process cannot crash another process (or read private data from it). If you're old enough to remember Windows 3 where this was not the case you can surely appreciate the stability this isolation is bringing to the user. In contrast to that, multiple threads inside a process share all process resources - memory, file handles and so on. Because of that, threading is inherently fragile - it is very simple to bring down one thread with a bug in another.

Multitasking and multithreading

In the beginning, operating systems were single-tasking. In other words, only one task (i.e. process) could be executing at the same time and only when it completed the job (when the task terminated), new task can be scheduled (started). That was good for hardware because operating system could be simple and unobtrusive and programs could execute at full speed. Programmers and users, on the other hand, hated that. To run a program (or even to compile a program) you had to put it into the queue and wait for a long time. After which the program would start and immediately crash, in most cases, and you'll have to start looking for the bug - on the paper as there was no interactive debugging. Ah, the good times ;-) .

As soon as the hardware was fast enough, multitasking was invented. Most computers still had only one processor (or better, they have one or more processor motherboards which acted together as one single-core CPU would today) but through the operating system magic it looked like this processor is executing multiple programs at the same time. Each program was give a small amount of time to do its job after which it was paused and another program took its place. After some indeterminate time (depending on the system load, number of higher priority tasks etc) the program could execute again and operating system would run it from the position in which it was paused, again only for the small amount of time. In technical terms, processor registers were loaded from some operating system storage immediately before the program was given its time to run and were stored back to this storage when program was paused. (And we all know that processor registers also include the location of current instruction in code, or Instruction Pointer, yes?)

Two very different approaches to multitasking are in use. In cooperative multitasking, the process itself tells the operating system when it is ready to be paused. This simplifies the operating system but gives a badly written program an opportunity to bring down whole computer. Remember Windows 3? That was cooperative multitasking at its worst.

Better approach is pre-emptive multitasking where each process is given its allotted time (typically about 55 milliseconds on a PC) and is then pre-empted; that is, hardware timer fires and takes control from the process and gives it back to the operating system which can then schedule next process. This approach is used in Windows 95, NT and all their successors.

That way, multitasking system can appear to execute multiple processes at once event if it has only one processor core. Things go even better if there are multiple cores inside the computer as multiple processes can really execute at the same time then.

The same goes for threads. Single-tasking systems were limited to one thread per process by default. Some multitasking were single-threaded (i.e. they could only execute one thread per process) but all modern Windows are multithreaded - they can execute multiple threads inside one process. Everything I said about multitasking applies to threads too. Actually, it is the threads that are scheduled, not processes.

Just for the curiosity - even on single-tasking operating system (for example, on DOS) it was possible to write a program that pretended to do multiple things at once. The programmer had to break down processing into very small parts and then mix those parts. Program would execute small part of job A, then part of job B, then job C and would return to the next part of job A. Complicated, error-prone, but definitely possible.

When to use multithreading

I've already hinted that multithreading is not simple. Still, multithreading must be good or operating systems would not support it. When should you use multithreading, then?

In the beginning I've given one part to that answer - when the program is slow and we want to speed it up. If that's the case we must somehow split the slow part into pieces that can be executed at the same time (which may be very hard to do) and then put each such piece into one thread. If we are very clever and if the problem allows that, we can even do that dynamically and create as many threads are there are processing units.

Another good reason to implement more than one thread in a program is to make it more responsive. In general, we want to move lengthy tasks away from the thread that is serving the graphical interface (GUI) into threads that are not interacting with the user (i.e. background threads). A good candidate for such background processing are long database queries, lengthy imports and exports, long CPU-intensive calculations, file processing and more.

Sometimes, multithreading will actually simplify the code. For example, if you are working with an interface that has simple synchronous API (start the operation and wait for its result) and complicated asynchronous API (start the operation and you'll somehow be notified when it is completed) as are file handling APIs, sockets etc, it is often simpler to put a code that uses synchronous API into a separate thread than to use asynchronous API in the main program. If you are using some 3rd party library that only offers you a synchronous API you'll have no choice but to put it into a separate thread.

A good multithreading example are servers that can serve multiple clients. Server usually takes a request from the client and then, after some potentially lengthy processing, returns a result. If the server is single-threaded, the code must be quite convoluted to support multiple simultaneous clients. It is much simpler to start multiple threads, each to serve one client.

Problems and solutions

Remember when I said that multithreading is not simple? Well, I don't know how to tell you gently, but I lied. Multithreading is hard. Very hard.

For example, splitting task into multiple threads can make the execution slower instead of faster. There's not many problems that can be nicely parallelized and in most cases we must pass some data from one thread to another. If there's too much communication between threads it can use more CPU than the actual, data processing code.

Then there's a problem of data sharing. When threads share data, we must be very careful to keep this data in a consistent state. For example, if two threads are updating shared data, it may end in a mixed state where half the data was written by the first thread and another half by the second.

This problem, race condition as it's called, is usually solved by some kind of synchronization. We use some kind of locking (critical sections, mutexes, spinlocks, semaphores) to make sure that only one thread at a time can update the data. However, that brings us another problem or two. Firstly, synchronization makes the code slower. If two threads try to enter such locked code, only one will succeed and another will be temporarily suspended and our clever, multithreaded program will again use only one CPU core.

Secondly, synchronization can cause deadlocks. This is a state where two (or more) threads forever wait on each other. For example, thread A is waiting on a resource locked by thread B and thread B is waiting on a resource locked by thread A. Not good. Deadlocks can be very tricky; easy to introduce into the code and hard to find.

There's a way around synchronization problems too. You can avoid data sharing and use messaging systems to pass data around or you can use well-tested lock-free structures for data sharing. That doesn't solve the problem of livelocks though. In livelock state, two (or more) threads are waiting on some resource that will never be freed because the other thread is using it, but they do that dynamically - they're not waiting for some synchronization object to become released. The code is executing and threads are alive, they can just not enter a state where all conditions will be satisfied at once.

The most famous (theoretical) example of resource protection is Dijkstra's Dining philosophers problem, well described at en.wikipedia.org/wiki/Dining_philosophers. Until next time it is your task to read about it and at least pretend to understand ;) so that we can continue our journey.

5 comments:

  1. Thanks very much for posting this article.

    ReplyDelete
  2. Thanks for the article, Primoz! A nice intro for the OTL book, I suppose ;) Not new, but still very touching :)

    ReplyDelete
  3. Not a bad intro, agree, but it would have to be freshened up.

    ReplyDelete
  4. Anonymous08:17

    Good intro, but without an explanation of *how* to implement multithreading properly, it's not overly insightful/useful. Hopefully, you can tie-in your OTL to a series of follow-ups.

    ReplyDelete
  5. There were six articles published in this series and I'll bring them all to this blog in the next days.

    ReplyDelete