Thursday, July 24, 2008

A lock-free queue, finally!

WiPIt was a long and windy road (1, 2, 3) but finally it brought us to the really interesting part - lock-free queue. Admit it, stack is fine but it is not really suitable for sending data from point A to point B. When forwarding data inside the application we typically want it to arrive in the same order as it was transmitted and that's what queues are for.

Still, there was a good reason for that lengthy introduction. The lock-free stack we all know and love by now ;) is basis for the lock-free queue. Even more, most of the stack code is reused and queue's Enqueue/Dequeue methods are incredibly similar to stack's Push/Pop.

Let's take a look at the method that inserts new element at the end of the queue. The Enqueue code is not just similar to Push. They are identical. [Compare: Push]

function TOmniBaseQueue.Enqueue(const value): boolean;
var
linkedData: POmniLinkedData;
begin
linkedData := PopLink(obcRecycleChain);
Result := assigned(linkedData);
if not Result then
Exit;
Move(value, linkedData.Data, ElementSize);
PushLink(linkedData, obcPublicChain);
end; { TOmniQueue.Enqueue }

OK, so enqueueing an item is identical to pushing it. How do we remove items from the head of the queue, then? Dequeue must return not the top element of the stack but the bottom one - that's the element that was enqueued (pushed) first.

We could traverse the stack and somehow remove the last element but removing elements from a lock-free structure is very problematic, as you've already seen. [Where was the problem in TOmniBaseStack? In PopLink. There you go!] Removing elements from the queue is even harder (if not impossible) so let's try another approach.

function TOmniBaseQueue.Dequeue(var value): boolean;
var
linkedData: POmniLinkedData;
begin
if obqDequeuedMessages.Head = nil then
obqDequeuedMessages.Head := InvertOrder(UnlinkAll(obcPublicChain));
linkedData := PopLink(obqDequeuedMessages);
Result := assigned(linkedData);
if not Result then
Exit;
Move(linkedData.Data, value, ElementSize);
PushLink(linkedData, obcRecycleChain);
end; { TOmniQueue.Dequeue }

If you compare it with the Pop method, you'll see that they are very similar. Dequeue does some magic in first two lines, uses a different chain header in the PopLink, but from that point onward they are identical.

function TOmniBaseStack.Pop(var value): boolean;
var
linkedData: POmniLinkedData;
begin
linkedData := PopLink(obcPublicChain);
Result := assigned(linkedData);
if not Result then
Exit;
Move(linkedData.Data, value, ElementSize);
PushLink(linkedData, obcRecycleChain);
end; { TOmniBaseStack.Pop }

The big question is, what happens in InvertOrder(UnlinkAll(obcPublicChain)) to make the old Pop code work as a queue? Well, let's draw some pretty pictures. I just luuuv pretty pictures.

Let's say we have a queue with two elements. Element 1 was enqueued first, followed by element 2. Our queue behaves like a stack when enqueueing so it's no wonder that I could just copy old picture from the stack introduction article to present this state.

queue with two elements

We have two empty nodes in the recycle chain and two allocated nodes in the public chain. Public chain header points to element 2, which points to element 1, which points to nil.

Dequeue is called next and it executes UnlinkAll. This method merely removes all nodes from the chain that was passed as an argument. In our case, all nodes are removed from the public chain. This is done atomically. [How? I'll show you later.]

after UnlinkAll

Now we have public chain header that points to an empty chain (nil, in other words) and unnamed pointer that points to the top element in the stack. This unnamed pointer is returned as the UnlinkAll result.

InvertOrder is called next. It walks over the chain that was passed as an argument and reverses each and every pointer. This is done in a simple, non-atomical way, as the chain was already removed from the shared storage so concurrency is not a problem.

after InvertOrder

InvertOrder returns a pointer to the (new) chain head. In our case, this pointer points to node 1, which points to node 2, which points to nil. Result of the InvertOrder call is stored into the class field obqDequeuedMessages (also known as dequeue chain header).

The rest of the Dequeue method simply treats obqDeqeuedMessages as a stack chain and pops top element from it. As the chain was reversed, the top element is the one we need - the one that was enqueued first. The atomic version of PopLink is used for simplicity, but a simpler code could be used instead as there could be no inter-thread conflicts.

after Dequeue

On the return from the Dequeue, dequeued chain points to node 2, recycle chain points to node 1 (its data was copied from the node to the Dequeue's value parameters and node was recycled) and public chain is nil.

Let's assume that a writer enqueues data 3 into the queue. As we've already seen, this is a normal Push operation and we already know how it works.

Enqueue after Dequeue

Dequeued chain still points to node 2, public chain points to former node 1 (which now holds the value 3) and recycle chain points to the first empty node.

Reader now executes Dequeue again. As the dequeued chain is not nil, UnlinkAll and InvertAll are not called. A node is simply popped from the dequeued chain.

after second Dequeue

Dequeued chain is now nil. Next call to Dequeue will call UnlinkAll to remove public chain, InvertOrder to invert it, result will be stored in the dequeued chain and dequeue operation will continue in an already known fashion.

There is one important consequence of this approach. The lock-free queue (at least our implementation) can only have one reader. The problem lies in this non-atomic fragment:

  if obqDequeuedMessages.Head = nil then
obqDequeuedMessages.Head := InvertOrder(UnlinkAll(obcPublicChain));

If there were two readers, following could happen:

  • writer pushes one element into the queue
  • reader 1 starts the Dequeue, sees that Head is nil and is stopped just before it calls UnlinkAll
  • reader 2 starts the Dequeue, sees that Head is nil and proceeds with InvertOrder(UnlinkAll())
  • reader 2 updates the Head field of the dequeued chain
  • reader 1 continues its execution and passes empty public chain into UnlinkAll, which passes nil to InvertOrder, which returns nil and stores it into the Head parameter
  • the element that was dequeued by the reader 2 is lost

Multiple writers are supported, though, just like in the lock-free stack.

Let's take a quick look at the assembler code. UnlinkAll  is quite simple. First it clears the ecx register (sets up the nil pointer), loads chain.Head into eax and atomically swaps chain.Head with ecx (i.e., with nil). Old chain.Head is returned in the Result (eax), which is exactly what we want.

class function TOmniBaseContainer.UnlinkAll(var chain: TOmniHeadAndSpin): POmniLinkedData;
asm
xor ecx, ecx
mov edx, eax
mov eax, [edx]
@Spin:
lock cmpxchg [edx], ecx //Cut Chain.Head
jnz @Spin
end; { TOmniQueue.UnlinkAll }

InvertOrder is slightly longer but much simpler as there are no lock operations involved. Code merely walks over the pointer chain and inverts it.

class function TOmniBaseContainer.InvertOrder(chainHead: POmniLinkedData): POmniLinkedData;
asm
test eax, eax
jz @Exit
xor ecx, ecx
@Walk:
xchg [eax], ecx //Turn links
and ecx, ecx
jz @Exit
xchg [ecx], eax
and eax, eax
jnz @Walk
mov eax, ecx
@Exit:
end; { TOmniBaseStack.InvertOrder }

That's all folks. You now know how both lock-free OTL structures work and what are their limitations. Now I only have one more article on OtlContainers to write, one that will explain higher-level TOmniStack and TOmniQueue.

No comments:

Post a Comment