Lock-free queue algorithm, repeated reads for consistency

2023-01-18 19:36 问答作者：

I'm studying the lock-free (en-,de-)queue algorithms of Michael and Scott. The problem is I can't explain/understand (nor the paper does, apart from the comments in the code itself) a couple of lines.

Enqueue:

  enqueue(Q: pointer to queue_t, value: data type)
   E1:   node = new_node()        // Allocate a new node from the free list
   E2:   node->value = value      // Copy enqueued value into node
   E3:   node->next.ptr = NULL    // Set next pointer of node to NULL
   E4:   loop                     // Keep trying until Enqueue is done
   E5:      tail = Q->Tail        // Read Tail.ptr and Tail.count together
   E6:      next = tail.ptr->next // Read next ptr and count fields together
   E7:      if tail == Q->Tail    // Are tail and next consistent?
               // Was Tail pointing to the last node?
   E8:         if next.ptr == NULL
                  // Try to link node at the end of the linked list
   E9:            if CAS(&tail.ptr->next, next, <node, next.count+1>)
  E10:               break     开发者_如何学C   // Enqueue is done.  Exit loop
  E11:            endif
  E12:         else               // Tail was not pointing to the last node
                  // Try to swing Tail to the next node
  E13:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  E14:         endif
  E15:      endif
  E16:   endloop
         // Enqueue is done.  Try to swing Tail to the inserted node
  E17:   CAS(&Q->Tail, tail, <node, tail.count+1>)

Why is E7 needed? Does correctness depend on it? Or is it merely an optimization? This if can fail if another thread successfully executed E17, or D10 below, (and changed Q->Tail) while the first thread has executed E5 but not yet E7. But what if E17 is executed right after the first thread executes E7?

edit: Does this last sentence prove that E7 cannot be more than an optimization? My intuition is that it does, since I give a scenario were "apparently" the if statement would make the wrong decision, yet the algorithm would still be supposed to work correctly. But then we could replace the if's condition with a random condition, without affecting correctness. Any hole in this argument?

Dequeue:

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
   D1:   loop                          // Keep trying until Dequeue is done
   D2:      head = Q->Head             // Read Head
   D3:      tail = Q->Tail             // Read Tail
   D4:      next = head.ptr->next      // Read Head.ptr->next
   D5:      if head == Q->Head         // Are head, tail, and next consistent?
   D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
   D7:            if next.ptr == NULL  // Is queue empty?
   D8:               return FALSE      // Queue is empty, couldn't dequeue
   D9:            endif
                  // Tail is falling behind.  Try to advance it
  D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  D11:         else                    // No need to deal with Tail
                  // Read value before CAS
                  // Otherwise, another dequeue might free the next node
  D12:            *pvalue = next.ptr->value
                  // Try to swing Head to the next node
  D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
  D14:               break             // Dequeue is done.  Exit loop
  D15:            endif
  D16:         endif
  D17:      endif
  D18:   endloop
  D19:   free(head.ptr)                // It is safe now to free the old node
  D20:   return TRUE                   // Queue was not empty, dequeue succeeded

Again, why D5 is needed? Correctness or optimization? I'm not sure what "consistency" these tests give, since it seems they can get inconsistent right after the if succeeds.

This looks like a standard technique. Can someone explain the motivation behind it? To me, it seems like the intention is to avoid doing an (expensive) CAS in those few cases it can be noticed that it would definitely fail, but at the cost of always doing an extra read, which is not supposed to be so much cheaper itself (e.g. in Java, Q->Tail would be required to be volatile, so we would know we are not merely reading a copy cached in a register but reading the real thing, which would be translated in prepending the read with a fence of some sort), so I'm not sure what's really going on here... thanks.

edit This has been ported to Java, more precisely in ConcurrentLinkedQueue, e.g. E7 is line 194, while D5 is line 212.

I was stuck on this same question, and sceptical that this could be an optimization, so I asked Maged Michael, one of the authors of this paper. This is his response:

E7 and D5 are needed for correctness.

The following case shows why E7 is needed:

Thread P reads the value <A,num1> from Q->Tail in line E5

Other threads change the queue such that the node A is removed and maybe later reused in a different queue (or a different structure with similar node structure) or allocated by a thread to insert it in this same queue. In any case A is not in this queue and its next field has the value <NULL, num2>.

In line E6, P reads the value <NULL, num2> from A->next into next.

(Skipping line E7)

In line E8, P finds next.ptr == NULL

In line E9, P executes a successful CAS on A->next as it finds A->next == <NULL, num2> and sets it to <node,num2+1>.

Now, the new node is incorrectly inserted after A which doesn't belong to this queue. This might also corrupt another unrelated structure.

With line E7, P would have discovered that Q->Tail has changed and would have started over.

Similarly for D5.

Basically, if our read from tail.ptr->next is going to make us believe that the next pointer is null (and thus that we may write to the node), we must double check that this null refers to the end of the current queue. If the node is still in the queue after we read the null, we may assume that it really was the end of queue, and the compare-and-swap will (given the counter) catch the case where anything happened to this node after the test in E7 (removing the node from the queue will necessarily involve mutating its next pointer).

Why is E7 needed?

Its more for optimization.

Consider two threads trying to enqueue at the same time. They all get to E5 but before thread 1 gets to E7 thread 2 successfully queues. When thread 1 gets to E7 it will observer t == tail to be false then retries. This will avoid a costly CAS. Of course its not full proof because E7 can succeed before thread 2 enqueues and eventually fails the CAS and has to retry anyway.

why D5 is needed

Similar to D5

Again, both functions without E7 and D5 would work. There was probably some benchmarking going on and found that under moderate contention the double check increases throughput (this is more of an observation and less of fact).

Edit:

I went and read the paper on this queue a bit more. The check is also there for correctness of a lock free algorithm and less of the data structure's state.

The lock-free algorithm is non-blocking because if there are non-delayed processes attempting to perform operations on the queue, an operation is guaranteed to complete within finite time. An enqueue operation loops only if the condition in line E7 fails, the condition in line E8 fails, or the compare and swap in line E9 fails. A dequeue operation loops only if the condition in line D5 fails, the condition in line D6 holds (and the queue is not empty), or the compare and swap in line D13 fails. We show that the algorithm is non-blocking by showing that a process loops beyond a finite number of times only if another process completes an operation on the queue.

http://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf

继续阅读：concurrency data-structures nonblocking

Lock-free queue algorithm, repeated reads for consistency

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？