Sunday, November 08, 2020

Readers-writer lock - Part 1: Why?

One of the pleasant surprises in Delphi 10.4.1 was the addition of a new readers-writer lock implementation TLightweightMREW. While it was probably not noticed by most of the users, I was quite happy to see it implemented. 

So now you are asking yourself - what is this readers-writer lock and why am I so happy to see it in Delphi? Well, I'm glad that you're asking! Let me explain ...

In multithreaded programming (as most of my horror stories start), we frequently run into a problem of resource sharing. Two threads want to modify a shared resource at the same time and that can cause many problems, from information being overwritten to corrupted data and program crashes.

To fix this, we add resource protection. Usually that is just a critical section (typically through a TCriticalSection wrapper), or Delphi's TMonitor. Sometimes, however, protecting resources with a simple critical section causes an unnecessary performance drop, and that's when a readers-writer lock (may) come into play.

Each thread can either read from a shared resource or write into it. As long as all threads are just reading, the situation is simple. We don't have to protect the shared resource at all. If all threads are writing, the situation is also simple. At any time, at most one thread may modify the resource, so we need to protect the access with a locking mechanism (critical section etc). 

Interesting things start happening when we have a combination of reading threads (readers) and writing threads (writers). While readers by themselves don't need protection, we must never allow a reader and a writer to work simultaneously on a shared resource. This can cause problems when readers read data that is only partially modified (updated) or data that is no longer here. (Simple example: Thread 1 is enumerating a TList. Thread 2 removes all data from this TList.)

To recapitulate, we have the following rules:

  • Multiple readers may access the shared resource simultaneously,
  • Writers must get exclusive access to the shared resource,
  • A writer can only get access when there are no active readers,
  • A reader can only get access when there are no active writers.

A locking mechanism that supports such locking is called a readers-writer lock (or single-writer lock, multi-reader lock, MRSW (multiple readers single writer) lock, MREW (multiple readers exclusive writer) lock ...). Yes, this locking mechanism has many names. I will call it a MREW lock for brevity (and because that's the abbreviation used in the Delphi RTL).

MREW locks are useful whenever the threads mostly read from the shared data, while updates are infrequent. In such a situation it is not practical to protect access with a critical section, as that would prevent reader threads from working in parallel. It is better to use a MREW lock.

The Delphi RTL actually got its first MREW implementation a long time ago. A terribly named TMultiReadExclusiveWriteSynchronizer was introduced more than 20 years ago in Delphi 4 (while much nicer type alias TMREWSync added in a later edition). While completely functional, this implementation has two problems.

Firstly, it is terribly slow. So slow that it is actually almost never OK to use it. If a protected area of code is short, then it is almost always better to use a simple critical section for synchronization.

Secondly, TMREWSync implements a readers-writer lock on Windows only! On all other platforms it simply wraps a standard critical section.

These issues are a consequence of the old implementation. In 1998, Windows did not implement a MREW lock. So the Delphi people had to build a MREW lockfrom existing synchronization primitives (specifically, events). Events are an excellent synchronization mechanism, but they are much slower than critical sections. It seems that something in this implementation also made it hard to port to other operating systems (but I have no idea why).

Only much later Windows got a decent MREW implementation (slim reader/writer).  There was, however, never any official RTL wrapper  until now.

This is a good place to stop so I'll press Pause here. In the next post I'll tell more about the new TLightweightMREW and how it compares to TMREWSync. And in the third, last part we'll see how different synchronization mechanisms perform in practice.

20 comments:

  1. I wonder how you did circumvent the biggest issue of those lockers, which is that those locks cannot be acquired recursively, whereas a Critical Section has no trouble being locked in cascade - if it is unlocked the same number of times.
    This is what makes using those slim locks difficult to use in general-use libraries, where you could never be sure that a lock won't be done within another lock...

    ReplyDelete
    Replies
    1. Actually, both slim reader/writer and pthread_rwlock allow the _read_ lock to be acquired recursively. They don't allow _write_ lock to be acquired recursively, though, but you can fix that with an internal lock count.

      Delete
    2. The writelockcount is indeed a good idea. I guess InterlockedIncrement() is necessary to ensure proper thread safety.

      Delete
    3. IME slim locks shine when "naked", once you start trying to "pimp" them for recursion support or escalation, you quickly lose the performance advantage: they are just not that much faster than critical sections f.i. (a bit old but illustrates the point https://www.delphitools.info/2013/12/09/slim-multi-read-single-write-locks/)

      However they work very well when what you want to protect is a smallish bit of code, with no function calls and no risk of recursion, for instance as an alternative to "lockless" algorithms, as they are then much simpler to get right.

      Delete
  2. The original TMultiReadExclusiveWriteSynchronizer supported both recursive locks and lock promotion.
    The ability to promote a read lock to write locks of course meant that people kept running into dead locks. Not because of a bug in TMultiReadExclusiveWriteSynchronizer but because that's what you get if there's more than one thread doing read to write promotion. Unfortunately Danny Thorpe thought that this was a problem because in Delphi 6 he reimplemented TMultiReadExclusiveWriteSynchronizer without the ability to do seamless lock promotion - which was the only thing that could have justified the cost of using the lock.

    @Arnaud: you don't need to use interlocked for the write counter if it's a threadvar. You can do it without threadvar but then you end up with having to do so much additional stuff that the lock is no longer slim.

    ReplyDelete
    Replies
    1. @Anders A threadvar doesn't make any sense because it would be global, and would count all the write locks, not the single/expected one.
      InterlockedIncrement() is faster than a threadvar anyway.

      Delete
    2. My mistake; I meant thread private var. With a thread private var you just need to store the write lock level per lock.
      With regard to recursive write using InterlockedIncrement I can't see how you would do that without additional locking to avoid races between writers.

      Delete
    3. Threadvars are quite limited in number, so a better approach may be to use a TThreadLocalCounter (System.SysUtils). TMultiReadExclusiveWriteSynchronizer demonstrates how it would work.

      IIRC you need to use CompareExchange in a loop to implement recursive write functionality. I'll drop a full example in a future post.

      Delete
    4. @Anders: what do you call a "thread private var"? A counter field defined in a TThread descendant? But what is the point of it? There should be such a counter var per lock, not per thread. I am confused.

      Delete
    5. > There should be such a counter var per lock, not per thread

      Why?
      This counter is to track RECURSION, that is, a single thread continuously acquiring more and more locks. It does not make sense to be global - that is, Thread A acquires 3 locks of the same resource, then thread B releases 2 of those 3 locks.

      Basically the only global data are flags, if there are active readers or writers or not. For implementation efficiency it seems natural for those flags to be implemented as thread counters. But it is not necessary. They can be TThreadList of threads, for example (just would be slow).

      TL;DR - recursion counters are thread-private vars naturally and thus do not need InterlockedIncrements. The only moment they have global effects is when they go one to zero or zero to one. Then - like in the Delphi's numerous BeginUpdate/EndUpdate - those should be propagated into global lock states (atomic threads counters).

      Recursion counters themselves though should never be global and accessible by other threads. It is just dangerous. Recursion counters are natural fit for threadvar or TThreadLocalCounter.

      Delete
    6. @Anders need for a read-become-write lock propagation sounds a code smell to me. As i grok the concept, MREW readers are those who need consistent data, and all. Like read-only transactions in SQL. They do not intend on limiting others rights to change the data, only to delay it. I even can envision a backlog, a queue of writers, waiting all the readers to go away and let them do their jobs.

      If you need to freeze the data for an hour, do some heavy number crunching, and then throw in the results - you just do not need light-weight locks. Use writer's queue, or maybe barriers.

      If you need a light-weight locks, then your updates are expected nigh instant starting with obtaining required volatile input.

      Delete
    7. @Arioch, I don't think that recursion count has to be threadlock because by the nature of the write lock only one thread can "own" the counter at any moment. I believe this will work (didn't test it yet!):

      procedure TReentrantMREW.BeginWrite;
      begin
      if FLockOwner = TThread.Current.ThreadID then
      Inc(FLockCount)
      else begin
      BeginWrite;
      FLockOwner := TThread.Current.ThreadID;
      FLockCount := 1;
      end;
      end;

      procedure TReentrantMREW.EndWrite;
      begin
      if FLockOwner <> TThread.Current.ThreadID then
      raise Exception.Create('Not an owner');

      Dec(FLockCount);
      if FLockCount = 0 then begin
      FLockOwner := 0;
      EndWrite;
      end;
      end;

      One problem I just noticed: FLockOwner modification must be done with interlocked instructions. Then it should work.

      Delete
    8. > by the nature of the write lock only one thread

      I believe it is orthogonal to readers-vs-writers. Readers should need re-entrancy (as recurions counting was named somewhere... at MS perhaps) even more than writers, for their work is naturally more continuous than writers "bursts". Just think about composition of records/objects into larger ones, each of them guards their consistent reading, and the overarking object-container does so too - here is your recursion.

      More so, in case of some crash (assertion failure, anything) it would be nice if the lock object could revoke its lock from the global storage. Which is again easier to do if recursions are counted locally not globally.

      And i also think recoursion dataframes (perhaps plain old objects or advanced records rather than classes) should be per-thread. For the sake of automated finalization, for example.

      procedure TThreadLocalReentrantMREW.BeginWrite;
      begin
      if FLocalWritingLockCount {cardinal} = 0 then
      DoAquireWriteLock();
      Inc(FLocalWritingLockCount);
      end;

      DoAquireWriteLock(); - is a stock method of any non-recursive MREW lock.
      With all the checks the resource is free to grab, CPU spins and other complex tasks

      FLocalWritingLockCount would have to be made threadvar or analogue.
      Actually, both thread-local AND instance-local. 2D matrix for every lock. This gets tricky.

      It was said threadvars exhaust TLS per-unit. So it would not be a lot of a burden. Granted the problem of efficiently allocating dataframes, when every thread would have different set of active MREW locks... and each dataframe should be unique per-thread and per-lock-object.

      Maybe there can just be a single conceptual "map" of dataframes. Each thread has a unique threadvar pointer for the same-sized buffer (dynarray for example). Each instance of TMREWLock would have the same pointer offset (array index) for their dataframes, but with different threadvar base pointer. Creating a lock object would require finding an unused slot or expanding the dataframes in all the threads. This can be tricky. But don't heap managers already solving something similar?

      Such a map then would be rather sparse, many threads would only actually use some of the "slots" (lock class instances). So what? Can an application have millions of MREWLock objects and still be able to scale anyway?

      Delete
    9. > More so, in case of some crash (assertion failure, anything) it would be nice if the lock object could revoke its lock from the global storage

      Here i meant some per-thread object (whatever it would be language-wise). If a thread crashes oir gets killed holding a potentially recurrent lock - it would be nice if such a lock would be released automagically.

      Delete
    10. @Arioch readers already have reentrancy on Windows and Posix so no problem there. So we only have to protect writers.

      Also, the underlying system MREW support doesn't handle crashed threads. There's no sense in adding such support into the derived implementation. It would just complicate everything too much (and slow operations down).

      Delete
    11. MSDN is ambiguous...

      It first singles out exclusive locks w.r.t. recurrency, but then it says "An SRW lock is the size of a pointer. The advantage is that it is fast to update the lock state. The disadvantage is that very little state information can be stored, so SRW locks cannot be acquired recursively." without any distinction between R and W.

      Perhaps they just use integer counter with -1 for exclusive write lock and +N for many read locks, and then thread B can be releasing locks of thread A, which seems dangerous...

      Anyway, if to judge by MSDN, then recuerency of readers seems undocumented volatile implementation detail, rather than contract.

      Delete
    12. > Also, the underlying system MREW support doesn't handle crashed threads.

      It seems to be threads-ignorant. Whoever holds the lock, whoever releases it, Windows does not care.

      https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-releasesrwlockexclusive
      https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-releasesrwlockshared

      Both functions say the same:

      Return value
      None

      Remarks
      The SRW lock must be released by the same thread that acquired it. You can use Application Verifier to help verify that your program uses SRW locks correctly (enable Locks checker from Basic group).

      No any error possible, according to MSDN.
      Not even double-release for writers.

      Not unless a special hardened implementation gets hooked in.

      -----------

      So, basically it mean Windows does not care about crashed threads and would not prevent applications from cleaning up. If they can. And if they can do it fast enough.

      And if speed is what above all then perhaps SRW wrapper should not be slow Delphi class, but fast plain old TurboPascal object or for LLVM an advanced record, mORMot style.

      Talking about thread crashes though, so you made the wrapper like above and you detect that a thread goes to release write-lock it does not own. What then?

      Delete
    13. >>>> FLockOwner modification must be done with interlocked instructions.

      If by inner calls to BeginWrite/EndWrite you actualyl meant calls to underlying blocking OS functions, then no, there seems no point to make FLockOwner (misnamed, better FWriteLockOwner) MT-hardened. OS calls, the way you inserted them, should be natural barrier. You only modify Owner var while you are holding the exclusive lock anyway.

      Delete
    14. > It would just complicate everything too much (and slow operations down).

      If speed is paramount, then record-based wrapper should consist of INLINE functions.
      If we can sacrifice few brach-predicted indirections, then MREW implementation better be pluggable like GetMem is. Then developer can have both slow but sanity-checking implementation and blazing blindly trusting implementation, and even select one even at early runtime... (special config handle to catch shroedinbugs on client site)

      Delete
  3. I often use them (basically one based on the firebird sources) and I never ever needed recursive properties for locks

    ReplyDelete