On Saturday me and my wife visited a classical music concert. Although I like this kind of music, the particular combination of instruments (harp and violin) didn’t really interest me that much, especially when they played modernist Slovenian composers. [Debussy, on the other hand, was superb.]
Anyway, I got submerged into music and half of my brain switched of and then I got all sorts of weird programming ideas. The first was how to solve the ABA problem in the initial dynamic queue implementation. [Total failure, that idea, it didn’t work at all.] The second, however, proved to be very useful as it solved the memory release problem (provided that ABA gets fixed, of course).
The new memory release scheme brought with it a new strength of will. If I had solved that one, then maybe, just maybe, I can also solve the ABA problem, I thought to myself and returned to the code. And then it dawned on me …
The problem with the initial implementation was that the head/tail pointer and corresponding tag were access asynchronously. In the multithreading environment, that is always a problem. Somehow I had to modify them at the same time, but that didn’t look feasible as the tag was living in the dynamically allocated block and the head/tail pointer was stored in the object itself. I couldn’t put the head/tail into the block, but I could put a tag near to the head/tail pointer! [A copy of the tag, actually, as I still needed the tags to be stored in the data block.] Then I could use 8-byte compare-and-swap to change both the pointer and the tag at the same time!
There was one problem, though. In the initial implementation, the tail was allowed to catch the head. If that happened with the new scheme, both head and tail pointers would be the same (and pointing to a tagFree slot) but the first enqueue operation would only modify the head tag, although the tail tag would in reality also change! It seemed like I was just pushing the ABA problem from place to place :(
Still, there is a simple (at least for some values of that word) solution to such problems – introduce the sentinel. This is a special element signifying that some pointer (tail, in my case) has reached the end of list. A good idea, but could it be made to work?
I fired up my trusty spreadsheed (very good stuff for simulations) and in few hours I had the basic plan laid out.
[Yes, that’s the picture from the yesterday’s teaser.]
It was a really simple work to convert this to the code. After fixing few bugs, I had the new queue running, faster then ever before!
I’ll put together a long article describing all the tricks inside the new dynamic queue, but that will take some time, sorry. In the meantime, you can checkout the current OtlContainers and read the pseudo-code documentation.
TOmniQueue
===============tags:
tagFree
tagAllocating
tagAllocated
tagRemoving
tagEndOfList
tagExtending
tagBlockPointer
tagDestroying
tagHeader
tagSentinelheader contains:
head
slot = 4 bytes
tag = 4 bytes
tail
slot = 4 bytes
tag = 4 bytes
all are 4-alignedslot contains:
TOmniValue = 13 bytes
tag = 1 byte
offset = 2 bytes
TOmniValues are 4-alignedblock is initialized to:
[tagHeader, num slots - 1, 0] [tagSentinel, 0, 1] [tagFree 0, 2] .. [tagFree, 0, num slots - 2] [tagEndOfList, 0, num slots - 1]Enqueue:
repeat
tail = header.tail.slot
old_tag = header.tail.tag
if header.tail.CAS(tail, tagFree, tail, tagAllocating) then
tail.tag = tagAllocating
break
else if header.tail.CAS(tail, tagEndOfList, tail, tagExtending) then
tail.tag = tagExtending
break
else
yield
forever
if old_tag = tagFree then
store <value, tagAllocated> into slot
header.tail.CAS(tail, tagAllocating, NextSlot(tail), NextSlot(tail).tag)
else
allocate block // from cache, if possible
next = second data slot in the new block
set next to <tagAllocated, value>
set last slot in the original block to <new block address, tagBlockPointer>
header.tail.CAS(tail, tagExtending, next, next.tag)
// queue is now unlocked
preallocate blockDequeue:
repeat
if header.head.tag = tagFree then
return false
head = header.head.slot
old_tag = header.head.tag
caughtTheTail := NextSlot(header.head.slot) = header.tail.slot;
if head.head.CAS(head, tagAllocated, head, tagRemoving) then
head.tag = tagRemovings
break
else if header.head.Tag = tagSentinel then
if caughtTheTail then
return false
else if header.head.CAS(head, tagSentinel, head, tagRemoving) then
head.tag = tagRemoving
break
else if header.head.CAS(head, tagBlockPointer, head, tagDestrogin) then
head.tag = tagDestroying
break
else
yield
forever
firstSlot = head - head.Offset // point to first slot
if old_tag in [tagSentinel, tagAllocated] then
next = NextSlot(head)
if tag = tagAllocated then
fetch stored value
if caughtTheTail then
header.head.CAS(head, tagRemoving, head, tagSentinel)
firstSlot = nil // do not decrement the header counter
else
header.head.CAS(head, tagRemoving, next, next.tag)
else
next = head.value // points to the next block's sentinel
header.head.CAS(head, tagDestroying, next, tagSentinel)
old_tag = tagSentinel // force retry
// queue is now unlocked
if assigned(firstSlot) and (InterlockedDecrement(firstSlot.value) = 0) then
release block
if old_tag = tagSentinel
retry from beginning
just a side note to improve your English:
ReplyDelete"my wife and I"
not
"me and my wife"
Definitely. Plus tons of other errors, I presume. My slavic origin must show somewhere :)
ReplyDelete