开发者

Synchronizing access to a doubly-linked list

I'm trying to implement a (special kind of) doubly-linked list in C, in a pthreads environment but using only C-wrapped synchronization instructions like atomic CAS, etc. rather than pthread primitives. (The elements of the list are fixed-size chunks of memory and almost surely cannot fit pthread_mutex_t etc. inside them.) I don't actually need full arbitrary doubly-linked list methods, only:

  • insertion at the end of the list
  • deletion from the beginning of the list
  • deletion at arbitrary points in the list based on a pointer to the member to be removed, which was obtained from a source other than by traversing the list.

So perhaps a better way to describe this data structure would be a queue/fifo with the possibility of removing items mid-queue.

Is there a standard approach to synchronizing this? I'm getting stuck on possible deadlock issues, some of which are probably inherent to the algorithms involved and others of which might stem from the fact that I'm trying to work in a confined space with other constraints on what I can do.

Edit: In particular, I'm stuck on what to do if adjacent objects are to be removed simultaneously. Presumably when removing an object, you need to obtain locks on both the previous and next objects in the list and update their next/prev pointers to point to one another. But if either neighbor is already locked, this would result in a deadlock. I've tried to work out a way that any/all of the removals taking place could walk the locked part of the list and determine the maximal sublist that's currently in the process of removal, then lock the nodes adjacent to that sublist so that the whole sublist gets removed as a whole, but my head is starting to hurt.. :-P

Conclusion(?): To follow up, I do have some code I want to get working, but I'm also interested in the theoretical problem. Everyone's answers have been quite helpful, and combined with details of the constraints outside what I expressed here (you really don't want to know where the pointer-to-element-to-be-removed came from and the synchronization involved there!) I've decided to abandon the local-lock code for now and focus on:

  • using a larger number of smaller lists which each have individual locks.
  • minimizing the number of instructions over which locks are held and poking at memory (in a safe way) prior to acquiring a lock to reduce the possibility of page fault开发者_运维技巧s and cache misses while a lock is held.
  • measuring the contention under artificially-high load and evaluating whether this approach is satisfactory.

Thanks again to everybody who gave answers. If my experiment doesn't go well I might come back to the approaches outlined (especially Vlad's) and try again.


Why not just apply a coarse-grained lock? Just lock the whole queue.

A more elaborate (however not necessarily more efficient, depends on your usage pattern) solution would be using a read-wrote lock, for reading and writing, respectively.


Using lock-free operations seem to me not a very good idea for your case. Imagine that some thread is traversing your queue, and at the same moment the "current" item is deleted. Doesn't matter how many additional links your traverse algorithm holds, all that items may be deleted, so your code would have no chance to finish the traversal.


Another issue with compare-and-swap is that with pointers you never know whether it really points to the same old structure, or the old structure has been freed and some new structure is allocated at the same address. This may or may not be an issue for your algorithms.


For the case of "local" locking (i.e., the possibility to lock each list item separately), An idea would be to make the locks ordered. Ordering the locks ensures the impossibility of a deadlock. So your operations are like that:

Delete by the pointer p to the previous item:

  1. lock p, check (using perhaps special flag in the item) that the item is still in the list
  2. lock p->next, check that it's not zero and in the list; this way you ensure that the p->next->next won't be removed in the meantime
  3. lock p->next->next
  4. set a flag in p->next indicating that it's not in the list
  5. (p->next->next->prev, p->next->prev) = (p, null); (p->next, p->next->next) = (p->next->next, null)
  6. release the locks

Insert into the beginning:

  1. lock head
  2. set the flag in the new item indicating that it's in the list
  3. lock the new item
  4. lock head->next
  5. (head->next->prev, new->prev) = (new, head); (new->next, head) = (head, new)
  6. release the locks

This seems to be correct, I didn't however try this idea.

Essentially, this makes the double-linked list work as if it were a single-linked list.


If you don't have the pointer to the previous list element (which is of course usually the case, as it's virtually impossible to keep such a pointer in consistent state), you can do the following:

Delete by the pointer c to the item to be deleted:

  1. lock c, check if it is still a part of the list (this has to be a flag in the list item), if not, operation fails
  2. obtain pointer p = c->prev
  3. unlock c (now, c may be moved or deleted by other thread, p may be moved or deleted from the list as well) [in order to avoid the deallocation of c, you need to have something like shared pointer or at least a kind of refcounting for list items here]
  4. lock p
  5. check if p is a part of the list (it could be deleted after step 3); if not, unlock p and restart from the beginning
  6. check if p->next equals c, if not, unlock p and restart from the beginning [here we can maybe optimize out the restart, not sure ATM]
  7. lock p->next; here you can be sure that p->next==c and is not deleted, because the deletion of c would have required locking of p
  8. lock p->next->next; now all the locks are taken, so we can proceed
  9. set the flag that c is not a part of the list
  10. perform the customary (p->next, c->next, c->prev, c->next->prev) = (c->next, null, null, p)
  11. release all the locks

Note that just having a pointer to some list item cannot ensure that the item is not deallocated, so you'll need to have a kind of refcounting, so that the item is not destroyed at the very moment you try to lock it.


Note that in the last algorithm the number of retries is bounded. Indeed, new items cannot appear on the left of c (insertion is at the rightmost position). If our step 5 fails and thus we need a retry, this can be caused only by having p removed from the list in the meanwhile. Such a removal can occur not more than N-1 times, where N is the initial position of c in the list. Of course, this worst case is rather unlikely to happen.


Please don't take this answer harshly, but don't do this.

You will almost certainly wind up with bugs, and very hard bugs to find at that. Use the pthreads lock primitives. They are your friends, and have been written by people who deeply understand the memory model provided by your processor of choice. If you try to do the same thing with CAS and atomic increment and the like, you will almost certainly make some subtle mistake that you won't find until it's far too late.

Here's a little code example to help illustrate the point. What's wrong with this lock?

volatile int lockTaken = 0;

void EnterSpinLock() {
  while (!__sync_bool_compare_and_swap(&lockTaken, 0, 1) { /* wait */ }
}

void LeaveSpinLock() {
  lockTaken = 0;
}

The answer is: there's no memory barrier when releasing the lock, meaning that some of the write operations executed within the lock may not have happened before the next thread gets into the lock. Yikes! (There are probably many more bugs too, for example, the function doesn't do the platform-appropriate yield inside the spin loop and so is hugely wasteful of CPU cycles. &c.)

If you implement your double-linked list as a circular list with a sentinal node, then you only need to perform two pointer assignments in order to remove an item from the list, and four to add an item. I'm sure you can afford to hold a well-written exclusive lock over those pointer assignments.

Note that I am assuming that you are not one of the few people who deeply understand memory models only because there are very few of them in the world. If you are one of these people, the fact that even you can't figure it out ought to be an indication of how tricky it is. :)

I am also assuming that you're asking this question because you have some code you'd actually like to get working. If this is simply an academic exercise in order to learn more about threading (perhaps as a step on your way to becoming a deep low-level concurrency expert) then by all means, ignore me, and do your research on the details of the memory model of the platform you're targeting. :)


You can avoid deadlock if you maintain a strict hierarchy of locks: if you're locking multiple nodes, always lock the ones closer to the head of the list first. So, to delete an element, first lock the node's predecessor, then lock the node, then lock the node's successor, unlink the node, and then release the locks in reverse order.

This way, if multiple threads try to delete adjacent nodes simultaneously (say, nodes B and C in the chain A-B-C-D), then whichever thread first gets the lock to node B will be the one that will unlink first. Thread 1 will lock A, then B, then C, and thread 2 will lock B, then C, then D. There's only competition for B, and there's no way that thread 1 can hold a lock while waiting for a lock held by thread 2 and while thread 2 is waiting on the lock held by thread 1 (i.e. deadlock).


You cannot get away without a lock for the whole list. Here's why:

Insert into an Empty List

Threads A and B wants to insert an object.

Thread A examines the list, finds it empty

A context switch occurs.

Thread B examines the list, finds it empty and updates the head and tail to point to its object.

A context switch occurs

Thread A updates the head and tail to point to its object. Thread B's object has been lost.

Delete an item from the middle of the list

Thread A wants to delete node X. For this it first has to lock X's predecessor, X itself and X's successor since all of these nodes will be affected by the operation. To lock X's predecessor you must do something like

spin_lock(&(X->prev->lockFlag));

Although I've used function call syntax, if spin_lock is a function, you are dead in the water because that involves at least three operations before you actually have the lock:

  • place the address of the lock flag on the stack (or in a register)
  • call the function
  • do the atomic test and set

There are two places there where thread A can be swapped out and another thread can get in and remove X's predecessor without thread A knowing that X's predecessor has changed. So you have to implement the spin lock itself atomically. i.e. you have to add an offset to X to get x->prev then dereference it to get *(x->prev) and add an offset to that to get lockFlag and then do an atomic operation all in one atomic unit. Otherwise there is always an opportunity for something to sneak in after you have committed to locking a particular node but before you have actually locked it.


I note that the only reason you need a doubly-linked list here is because of the requirement to delete nodes from the middle of the list, that were obtained without walking the list. A simple FIFO can obviously be implemented with a singly-linked list (with both head and tail pointers).

You could avoid the deletion-from-the-middle case by introducing another layer of indirection - if the list nodes simply contain a next pointer and a payload pointer, with the actual data pointed to elsewhere (you say memory allocation is not possible at the point of insertion, so you'll just need to allocate the list node structure at the same point that you allocate the payload itself).

In the delete-from-the-middle case, you simply set the payload pointer to NULL and leave the orphaned node in the list. If the FIFO pop operation encounters such an empty node, it just frees it and tries again. This deferral lets you use a singly-linked list, and a lockless singly-linked list implementation is significantly easier to get right.

Of course, there is still an essential race here around the removal of a node in the middle of the queue - nothing appears to stop that node coming to the front of the queue and being removed by another thread before the thread that has decided it wants to remove it actually gets a chance to do so. This race appears to be outside the scope of the details provided in your question.


Two ideas.

First, to avoid the deadlock problem I would do some sort of spinlock:

  • lock the item that is to be deleted
  • try to lock one of the neighbors, if you have cheap random bits available chose the side randomly
  • if this doesn't succeed abandon your first lock and loop
  • try to lock the other one
  • if this succeeds delete your item
  • else abandon both locks and loop

Since splicing an element out of a list is not very lengthy as an operation, this shouldn't cost you much performance overhead. And in case that you really have a rush to delete all elements at the same time, it still should give you some good parallelism.

The second would be to do lazy delete. Mark your elements that are to be deleted and only remove them effectively when they appear at the end of the list. Since you are only interested in the head and the tail the effective users of the list items can do this. The advantage is that when they are at the end when deleted, the deadlock problem disappears. The disadvantage is that this makes the final deletion a sequential operation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜