Sunday, December 04, 2011

Busy-Wait Initialization

In response to my recent article on various kinds of initialization, GJ proposed another algorithm, which I will call here busy-wait initialization. This is the first time I’ve encountered this approach and I have no idea whether it is wide-known and has an “official” name.
The algorithm itself is very simple:
class function BusyWait<T>.Initialize(var storage: T; factory: TFactory): T;
  if storage = nil then begin
    if InterlockedCompareExchangePointer(
         PPointer(@storage)^, PPointer(@factory)^, nil) = nil 
      storage := factory()
  while PPointer(@storage)^ = PPointer(@factory)^ do
[This is not exactly the algorithm GJ proposed but my modified version, changed to work in a generic class.]

It uses a sentinel which prevents multiple simultaneous threads from creating the shared resource at the same time. In this respect it is similar to the pessimistic approach. It, however, doesn’t need an external lock and implements the locking through the interlocked instruction, just like the optimistic initialization does.
Let’s step through the code to see how it works:
  • Firstly, a sentinel is stored in the global storage, but only if global storage itself is still uninitialized (nil).
  • If the global storage is not yet initialized (Interlocked … returned nil), new shared resource is created and stored in the global storage.
  • At the end, a busy loop waits until the storage does not contain a sentinel value anymore.
An address of the shared resource-creating factory is used as a sentinel as we can be sure that this value will never be returned from the factory itself.
Let’s walk through few scenarios.

Initialization in a single thread

  • Interlocked… succeeds and returns nil. Storage now contains @factory.
  • Factory method is called and result is stored in the storage. It does not contain @factory anymore.
  • While loop tests the condition, which is not true, and exits.

Calling Initialize after the resource was initialized

  • Let’s suppose that the outer if was executed before the storage was initialized and then the thread is paused and another thread initializes the resource.
  • Interlocked… fails and returns non-nil value, therefore factory method is not called.
  • While loop tests the condition, which is not true, and exits.

Two threads calling Initialize at the same time

  • First thread calls Interlocked… which succeeds and returns nil. Storage now contains @factory.
  • Second thread calls Interlocked… which fails.
  • Second thread enters the while loop. Loop condition is true and second thread starts looping.
  • First thread calls the factory, which initializes the resource.
  • Both first and second thread now skip or exit out of the while loop as the storage is no longer equal to @factory.

Advantages and disadvantages

Advantages are obvious:
  • Resource is created only once.
  • No external (operating system) locking is used.
Disadvantages, however, may or not may be very important – depends on your application.
  • Busy wait is used to wait on the resource.
  • Execution is very dependent on the x86/x64 memory model and would probably fail on other architectures. For example, it assumes that read/writes from/to a shared DWORD/QWORD memory location are atomic.

Initialization mechanisms in practice

It looks like it’s the time to write a good stress/performance test of all three initialization approaches … I don’t, however, have a good idea how to write this test (yet).


  1. Adrian18:05

    I don't like too much the idea of an infinite loop waiting for other thread to finish. What if the other thread never does it? That is:

    Two threads calling Initialize at the same time

    First thread calls Interlocked… which succeeds and returns nil. Storage now contains @factory.
    Second thread calls Interlocked… which fails.
    Second thread enters the while loop. Loop condition is true and second thread starts looping.
    First thread calls the factory, factory raises an Exception.

    First thread fails and it is handled gracefully, maybe with a dialog box to the user.
    Second thread keeps looping and using 100% of the cpu, until you kill the app.

    Am I getting something wrong? Is there a guarantee that the other thread will always set the value so the first thread can continue? Also, we are reading a shared value without a lock or a fence. Is it guaranteed that the cpu won't keep reading the cached value, and the loop never exit even if the first other changed it?

  2. gabr, thanks for implementation.
    Your code still isn't OK. You must put first end before while sentence to ensure that wail sentence is always executed.
    Maybe some ideas for future :)
    Instead value @factory you can use address @1, because read/write at this address will raise error if something was wrong initiated.

    In case that you have more object to Initialize you can simply return Instead Sleep(0) in while loop and start to Initialize second object. At the end of Initialize section check all involved pointers, if all aren't have @1 value (or @factory) than initialisation is finished. On this way all cores will be involved in initialisation.

  3. @adrian, the same thing should heppend when we wont to set Lock.
    You have five possibilites:
    1)Wait in infinite loop with calllin Sllep(0) or not
    2)Call SwitchToThread in infinite loop
    3)Set event
    4)Start to Initializate next object and at the end of initiation section wait that all objects are become Initializated.

  4. @GJ, you are correct, 'while' must always execute. Fixed.

    Actually, I started with address @1 and declared const CSentinel: pointer = pointer(1); in the Initialize method and that caused Runtime error 216 during program initialization. Weird, don't know why, but for now I've given up and reimplemented Initialize in a different way.

  5. @Adrian, all of my initialization patterns fail if factory raises uncaught exception. This is something that will have to be fixed.

  6. @gabr, Sorry still not ok...
    if storage <> nil then begin
    must be
    if storage = nil then begin

  7. @GJ, but of course! That part was never tested as I typed it directly into the blog post (blush) :(

  8. Maybe idea about testing and debugging of Initialization.
    1) Intel recommends that shared value is single cache line aligned. Check "8.4.6 Placement of Shared Synchronization Variable."
    2) So make a global container of 128 bytes aligned shared variables and allocate shared variables from that container. For container use list of array of 128 byte aligned shared variables. One array can have for instance 1023 shared variables. For faster allocations of NotInUse shared variable use stack. If you need more than 1023 shared variables allocate another array in to list.
    3) The rest space of 128 byte shared variable you can use for ReferenceCounter, ObjecType, ThreadID creator, CreationStarTicks, CreationEndTicks...
    4) In Initialize procedure set ThreadID and CreationStarTicks (just copy a rdtsc value) before you calling factory() and set CreationEndTicks on exit of it.
    5) In main thread (application) you can have simple monitor about created objects, so here you can calculate minimum, maximum and typical creation/initialization time.

  9. Hi gabr, didn't knew where to contact you - I can't reach any site under your old domain - I'm looking for the OmniThreadLibrary forum!


  10. @Leus, there's something wrong with the nameserver for the domain. I hope they'll fix it soon.

  11. If this is used in scenarios with guaranteed explicit initialization of the resource, this will probably work fine, but if used as a global with just-in-time allocation of the resource, there could be a problem :

    As an additional disadvantage, this method actually makes 'storage' temporarily contain the sentinel.

    So there's a chance that some other thread may read that value directly from 'storage' (perhaps using just an assigned-check, completely bypassing BusyWait.Initialize) and use it as if it was a normal resource...

    Obviously, this will end in disaster.

    If this method would be wrapped into a record with a default property and implicit overloads, then maybe it can be made to work without too much overhead.

    But all this only applies when there's no guaranteed initialization - as long as that can be arranged, this seems to be a better middle-ground than both pessimistic and optimistic locking. Thanks for sharing!