Friday, December 02, 2011

On Optimistic and Pessimistic Initialization

When you want to initialize a shared resource in a multithreaded application, you have two possibilities to choose from – optimistic and pessimistic. The difference between them is visible from the pseudocode.

var
Shared: T;

procedure OptimisticInitializer;
var
temp: T;
begin
if not assigned(Shared) then begin
temp := T.Create;
if not AtomicallyTestAndStore(Shared, nil, temp) then
temp.Free;
end;
end;

procedure PessimisticInitializer;
begin
if not assigned(Shared) then begin
Lock;
try
if not assigned(Shared) then
Shared := T.Create;
finally Unlock; end;
end;
end;

Both approaches have one common point – they only initialize the shared resource if it is not already initialized. After that, the execution follows two completely different paths.

Optimistic initializer always creates new resource. Then it tries to store the resource in the shared variable in an atomic matter and if that fails, it throws the temporary resource away. [Read Atomic Interface Initialization for detailed info.] This means that at some point in time, multiple copies of the resource may be created, but all except one will be promptly destroyed.

Advantage: No locking (only microlocking on the CPU level).
Disadvantage: Duplicate resources may be created at some point.

Pessimistic initializer first locks the resource (using some form of a critical section), then tests again if the resource is still uninitialized (another thread may have created it just before Lock was called) and then creates the resource and unlocks the lock.

Advantage: Only a single resource is created.
Disadvantage: Locking, additional critical section.

It is unclear which approach is better. Although locking slows the application more than microlocking, creating duplicate resources may slow it down even more. On the other hand, pessimistic initializer requires additional lock, but that won’t make much difference if you don’t create millions of shared objects. In most cases, shared resource will stay initialized most of the time, initialization code will be rarely called (possibly just once per shared object) and the complexity of initializer will not change the program performance in any meaningful way so the choice of initializer will manly be a matter of personal taste.

OmniThreadLibrary supports both optimistic and pessimistic approach, former through the Atomic<T> class and latter through the Locked<T> class.

var
optimistic: T;
pessimistic: Locked<T>;

Atomic<T>.Initialize(optimistic, CreateT);
pessimistic.Initialize(CreateT);

[CreateT is just a function that creates new instance of type T.]

The optimistic initializer expects the T to be a class or an interface. The pessimistic initializer is not limited in any way – you can use it with any type supported by the compiler. You could, for example, pessimistically initialize an integer.

var
a1: Locked<integer>;

a1.Initialize(function: integer begin Result := 42 end);

An interesting reading on initializers in .NET is Coordination Data Structures – LazyInit<T>, written by Pedram Rezaei.

21 comments:

  1. Adrian14:03

    Hi,
    I wonder if the pessimistic approach doesn't have any issues with the "double checked locking" (for those unaware: http://en.wikipedia.org/wiki/Double-checked_locking )

    I mean, couldn't it happen that in a moment where Shared is "half initialized" by a thread, other thread checks for Assigned(Shared), find it is true, and uses before it is completely initialized? I would agree it is very difficult for that to happen, but well, that makes the problem worse as it would be very hard to find bug.

    Afaik, this won't happen in x86 CPUs because they have a strong memory model, but it could happen in future processors as delphi grows more crossplatform, and even windows starts to support ARM.

    I'd love to see some more insight in the "double checked locking" pattern and delphi, since I have seen little. But my reasons for using optimistic locking are more because I see the double checked locking as dangerous, more than performance issues.

    ReplyDelete
  2. @Adrian, yes, pessimistic approach uses double-checked locking and, no, it doesn't have any problems.

    Let's assume three threads:
    - Threads 1 and 2 call Initialize at the same time.
    - Thread 1 gets the lock and proceeds with the initialization while thread 2 gets blocked in the Acquire call.
    - Thread 1 creates new object (calls factory()) and receives fully initialized object which is only then stored into the shared variable.
    - Thread 3 calls Initialize at the same time.
    - As the FInitialize is not set yet, it doesn't matter that the shared storage is already initialized - thread 3 will still get blocked in the Acquire.
    - Thread 1 sets FInitialized. Now any other thread calling Initialize will simply skip the initialization.
    - Threads 2 and 3 get unblocked one after another (in some unknown order), they both check FInitialize again and skip the initialization.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. @Adrian, yes, might be necessary to add a memory barrier after FValue := factory() on some CPU architecture. I think I'll change the implementation and insert full memory barrier after that assignment.

    ReplyDelete
  5. Method 3: You know you're going to need a global shared resource at some point, so you create it at program startup, before you spawn off the multiple threads. No need to worry about locking strategies then! :D

    ReplyDelete
  6. @Mason, Yep, you're correct. But this is sometimes hard to do, especially if you write helpers/class libraries like OTL and want to keep it simple to use.

    ReplyDelete
  7. Double checked locking as written here is fine on x86 and x64, with the Delphi compiler. It will break on other machines. Not that Delphi compilers for those other machines exist yet.

    ReplyDelete
  8. David, thanks. I must say that I find all concept of instruction reordering and memory fences very hard to grasp. I know what's going on but I can't make my mind understand it enough to check whether my code is valid or not. That's why I always write extensive stress tests.

    ReplyDelete
  9. David, if you can explain *why* this problem cannot occur on x86/x64, I (and I presume also many of my readers) would be very grateful. If the answer is to complicated to put in the comment, I'd gladly make a guest post for my blog out of it.

    ReplyDelete
  10. Adrian12:37

    Well, I'll say that I also find the whole concept hard to grasp, that's why I made my original comment (as I know there is people here who understands the topic much more than I do).

    About why memory fences are not required in x86, it is basically because the processor doesn't do as many optimizations as others do. Normally the optimizations guarantee that a single threaded program will see correct output, but they can break in multicore environments without locks, since the cpu could be reordering instructions it knows won't affect a "normal" single threaded app.

    I started writing an example here of why in my limited understanding you could have a problem with the double checked lock, but I erased it, because I could end up writing something factually wrong, and then it will live forever in the internet, confusing other people who comes here for information :)

    So I will better put some links here to some people who probably understands it a lot more:
    http://bartoszmilewski.wordpress.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

    http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/

    About delphi running only in x86/64, it is only partially true. XE2 already runs in IOS, even if using the FPC compiler, and according to embarcadero, next year it won't even need FPC, so you'll have a case of "pure" Delphi in non x86. More cases will certainly appear with win8 and ARM, so I think it is a good idea to get this ok today, even if it is more of a theoretical thing than a real issue right now.

    Adding a memory fence or a lock, even if expensive isn't really much of an issue, since it will only happen the first time you access and create the object. All other times (in normal program execution) you will just not enter the first condition:
    if not assigned(Shared) then begin
    ...

    and no locks or anything will happen, either in pessimistic or optimistic way.

    ReplyDelete
  11. >More cases will certainly appear with win8 and ARM, so I think it is a good idea to get this ok today, even if it is more of a theoretical thing than a real issue right now.

    According to K.I.S.S. principle it would be better not to complicate code with unnecessary constructions, which are made to prevent some "theoretical risks". There are a lot of real existing problems to solve, I'd better concentrate on them... No offence, Adrian ;)

    ReplyDelete
  12. @Adrian, great two articles, thanks!

    ReplyDelete
  13. Adrian17:46

    Anton,
    No offense taken, I am here actually to discuss something that I find very interesting, even if theoretical.

    Now, while I normally use a KISS methodology, (the older I get, the more I try to keep things simple), I don't think this is case where it applies. Everything you do has a cost of making it and a cost of maintaining it on one side. On the other side, you have the cost of not doing it. What YAGNI or other similar buzzwords will tell you is that normally the cost in the "doing it" side is more than the cost in the other side. But "normally" doesn't mean always.

    In this particular case, we are speaking about a single instruction that will be executed once the first time you access the resource. That's the cost on one side. On the other side, you have an almost impossible to debug bug, that will never raise in your tests, but that will show in your customers machines. (in the case of gabr, or on my own, we develop libraries that are used by thousands of people, and there is a good chance of the bug showing in other machine but not in yours).

    To me this is a clear case where you have to deal with it. And it is not the only one. For example, I have always assumed sizeof(pointer) <>32 bits. Up to some months ago, you might say it was an unnecessary thing. But when XE2 was released, my components didn't need a single change to work in 64 bits. I also cared about unicode long before D2009, and it also paid out.

    A last note: If we really wanted to KISS here, we would do:
    Lock;
    DoWhateverWeNeedToDoWithSharedResource;
    End Lock;

    All the problems come from trying to be smart and avoiding the lock, which makes a lot of sense since we are only going to init once and we are going to use the resource many times. But we are not KISSing here. So we might go an extra mile, and ensure the code is ok always, not just in the x86 case.

    Well, this is just my POV. I guess we can agree to disagree.

    ReplyDelete
  14. Adrian, your arguments are very persuasive and motivate each developer to perfection. I agree with you :)
    But still, we should always strongly clarify the circumstances, under which some code will behave correctly. I can have correct code, assuming that pointers are always 32 bit, which will execute well on 32-bit processors. Or I can have code which assumes that SizeOf(Char)=1, which also behaves well being compiled in Delphi 2007. When String suddenly became UnicodeString in Delphi 2009, all our code, which was "always ok", suddenly became "not ok". So we can never say "the code is ok always, not just in the x86 case", because we don't know exactly what do we mean by "not x86": what processors, what compilers, what compiling options etc.
    That's just what I wanted to say :)

    ReplyDelete
  15. Adrian21:03

    anton,
    I understand you point, and I actually agree with most of it :) But I still find this interesting to discuss, even if it was completely theoretical (which it isn't since you can create IOS apps with delphi)

    gabr,

    Thanks to you for all the wonderful work ;) I've been using some of your stuff for a long time. If you look at the source of GpTimeZones.pas you will find I made some small suggestions back in 2000 (v1.1a). Well, now you made me feel old. Seriously, thanks a lot.

    btw, here is another paper I found interesting while researching the topic:
    http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf

    worth reading, imho. I imagine the Delphi compiler doesn't do the optimizations mentioned in part 4, (setting first the pointer and then initializing it), so as long as the compiler doesn't do it and the cpu is an x86, it should be ok I guess. I would still use the optimistic way or initialization at the start unless I really need to.

    ReplyDelete
  16. gabr,

    My proposal is to combine optimistic and pesimistic methods. Simple and efective! Something like:

    procedure WaitOnInitiation(Shared: pointer);
    begin
    while Shared^ = @WaitOnInitiation do;
    end;

    procedure OptimisticInitializer;
    begin
    if CAS(nil, WaitOnInitiation, Shared) then
    Shared := T.Create;
    WaitOnInitiation(Shared);
    end;

    ReplyDelete
  17. Or even more simple...

    procedure Initializer;
    begin
    . if CAS(nil, Initializer, Shared) then
    . Shared := T.Create;
    . while Shared^ = @Initializer do sleep(0);
    end;

    ReplyDelete
  18. @GJ, this doesn't work if you call Initializer after the Shared was created.

    ReplyDelete
  19. This comment has been removed by the author.

    ReplyDelete
  20. No it works! Check it again!
    Because Shared is set the ptr value isn't equal @Initializer. When we call from second thread than tread wait in while loop as long that Shared is not equal @Initializer.

    ReplyDelete
  21. Oh, yes, sorry, I was misreading the while loop. Interesting approach - it initializes only once, uses no external locks, but uses busy wait instead.

    ReplyDelete