I know how to implement a no-wait release in my dynamic queue if somehow the ABA problem gets solved. (I also had some ideas on how to solve the ABA problem if the tail pointer never catches the head one. But that’s still very much in the design phase.)
Each block gets a header element (with a tagHeader tag). Each slot in the block uses the previously unused bytes (stuffing) to store its position (index) inside the block.
[Header|0|1023] [Free|1|0] … [Free|1022|0] [EndOfList|1023|0]
A number of all not-yet-released slots is stored in the header’s value field.
The second part of the Dequeue code is changed:
if tag = tagAllocated then
set tail to new block's slot 1
// new code that executes in all code paths:
interlocked decrement number of not-yet-released slots in the header
if the decrement resulted in value 0, release the block
So what is going on here?
- Additional index in the previously unused bytes is used to quickly jump to the header slot.
- Decrement-and-test is executed last in the Dequeue code and we know that the code won’t reference the current block anymore. Therefore it is safe to release the block at this point.
- Ever change (tagAllocated –> tagReleasing, tagBlockPointer –> tagDestroying) decrements this counter. Only when all tags are set to tagReleasing/tagDestroying the block is destroyed.
- We know that at this point no other thread may be referencing the block (because it has already decremented the counter – otherwise the counter wouldn’t be 0 yet).
I tested this approach by putting initial tag switching (tagFree –> tagAllocating etc) into a critical section, thusly bypassing the ABA problem. It works but the critical section really killed the performance when number of threads got high (N = 4, M = 4 case worked well, N = 8, M = 8 did not).