Wednesday, February 10, 2010

Bypassing the ABA problem

On Saturday me and my wife visited a classical music concert. Although I like this kind of music, the particular combination of instruments (harp and violin) didn’t really interest me that much, especially when they played modernist Slovenian composers. [Debussy, on the other hand, was superb.]

Anyway, I got submerged into music and half of my brain switched of and then I got all sorts of weird programming ideas. The first was how to solve the ABA problem in the initial dynamic queue implementation. [Total failure, that idea, it didn’t work at all.] The second, however, proved to be very useful as it solved the memory release problem (provided that ABA gets fixed, of course).

The new memory release scheme brought with it a new strength of will. If I had solved that one, then maybe, just maybe, I can also solve the ABA problem, I thought to myself and returned to the code. And then it dawned on me …

The problem with the initial implementation was that the head/tail pointer and corresponding tag were access asynchronously. In the multithreading environment, that is always a problem. Somehow I had to modify them at the same time, but that didn’t look feasible as the tag was living in the dynamically allocated block and the head/tail pointer was stored in the object itself. I couldn’t put the head/tail into the block, but I could put a tag near to the head/tail pointer! [A copy of the tag, actually, as I still needed the tags to be stored in the data block.] Then I could use 8-byte compare-and-swap to change both the pointer and the tag at the same time!

There was one problem, though. In the initial implementation, the tail was allowed to catch the head. If that happened with the new scheme, both head and tail pointers would be the same (and pointing to a tagFree slot) but the first enqueue operation would only modify the head tag, although the tail tag would in reality also change! It seemed like I was just pushing the ABA problem from place to place :(

Still, there is a simple (at least for some values of that word) solution to such problems – introduce the sentinel. This is a special element signifying that some pointer (tail, in my case) has reached the end of list. A good idea, but could it be made to work?

I fired up my trusty spreadsheed (very good stuff for simulations) and in few hours I had the basic plan laid out.

image

[Yes, that’s the picture from the yesterday’s teaser.]

It was a really simple work to convert this to the code. After fixing few bugs, I had the new queue running, faster then ever before!

I’ll put together a long article describing all the tricks inside the new dynamic queue, but that will take some time, sorry. In the meantime, you can checkout the current OtlContainers and read the pseudo-code documentation.

TOmniQueue
===============

tags:
  tagFree
  tagAllocating
  tagAllocated
  tagRemoving
  tagEndOfList
  tagExtending
  tagBlockPointer
  tagDestroying
  tagHeader
  tagSentinel

header contains:
  head
    slot = 4 bytes
    tag  = 4 bytes
  tail
    slot = 4 bytes
    tag  = 4 bytes
all are 4-aligned

slot contains:
  TOmniValue = 13 bytes
  tag        = 1 byte
  offset     = 2 bytes
TOmniValues are 4-aligned

block is initialized to:
[tagHeader, num slots - 1, 0] [tagSentinel, 0, 1] [tagFree 0, 2] .. [tagFree, 0, num slots - 2] [tagEndOfList, 0, num slots - 1]

Enqueue:
  repeat
      tail = header.tail.slot
      old_tag = header.tail.tag
      if header.tail.CAS(tail, tagFree, tail, tagAllocating) then
          tail.tag = tagAllocating
          break
      else if header.tail.CAS(tail, tagEndOfList, tail, tagExtending) then
          tail.tag = tagExtending
          break
      else
          yield
  forever
  if old_tag = tagFree then
      store <value, tagAllocated> into slot
      header.tail.CAS(tail, tagAllocating, NextSlot(tail), NextSlot(tail).tag)
  else
      allocate block // from cache, if possible
      next = second data slot in the new block
      set next to <tagAllocated, value>
      set last slot in the original block to <new block address, tagBlockPointer>
      header.tail.CAS(tail, tagExtending, next, next.tag)
      // queue is now unlocked
      preallocate block

Dequeue:
  repeat
      if header.head.tag = tagFree then
          return false
      head = header.head.slot
      old_tag = header.head.tag
      caughtTheTail := NextSlot(header.head.slot) = header.tail.slot;
      if head.head.CAS(head, tagAllocated, head, tagRemoving) then
          head.tag = tagRemovings
          break
      else if header.head.Tag = tagSentinel then
          if caughtTheTail then
              return false
          else if header.head.CAS(head, tagSentinel, head, tagRemoving) then
              head.tag = tagRemoving
              break
      else if header.head.CAS(head, tagBlockPointer, head, tagDestrogin) then
          head.tag = tagDestroying
          break
      else
          yield
  forever
  firstSlot = head - head.Offset // point to first slot
  if old_tag in [tagSentinel, tagAllocated] then
      next = NextSlot(head)
      if tag = tagAllocated then
          fetch stored value
      if caughtTheTail then
          header.head.CAS(head, tagRemoving, head, tagSentinel)
          firstSlot = nil // do not decrement the header counter
      else
          header.head.CAS(head, tagRemoving, next, next.tag)
  else
      next = head.value // points to the next block's sentinel
      header.head.CAS(head, tagDestroying, next, tagSentinel)
      old_tag = tagSentinel // force retry
  // queue is now unlocked
  if assigned(firstSlot) and (InterlockedDecrement(firstSlot.value) = 0) then
      release block
  if old_tag = tagSentinel
      retry from beginning

2 comments:

  1. just a side note to improve your English:
    "my wife and I"
    not
    "me and my wife"

    ReplyDelete
  2. Definitely. Plus tons of other errors, I presume. My slavic origin must show somewhere :)

    ReplyDelete