Why does ConcurrentWhatever.Count take a snapshot
On C#/.NET Little Wonders article for ConcurrentStack / Concurrent Queue, the following is mentioned regarding the .Count
operation on the ConcurrentStack
:
Count takes a snapshot of the stack and then counts the items. This means it is a O(n) operation, if you just want to check for an empty stack, call IsEmpty instead which is O(1).
I'm not greatly experienced in multi-threaded programming yet, but I understand why开发者_开发知识库 you can't just loop through the items on the collection and count them, because the collection may be changing by other threads at the same time. However, if you have to lock the ConcurrentStack
long enough to make a snapshot, wouldn't it be easier just to count the items while you have it locked, return that count, and release the lock, instead of taking the overhead and time cost of generating an entire snapshot before releasing the lock?
I may be missing something basic about how this works, and I would appreciate any thoughts or insight you may have.
I don't know if it's implemented like that, but if you implement it as a singly linked list with immutable nodes then a snapshot is almost free and requires no locking. You just get the current top element. And then follow the linked list to the beginning.
You could save the position of each node on the stack the node, but that'd trade memory for performance of Count
. And probably count isn't called commonly enough to warrant that extra memory per node.
Most ConcurrentCollection work without locks at all. Such a linked-list backed stack could for example be built using Interlocked.CompareExchange
on the field pointing to the current node. And if you make some operations lockless, you usually need to make all of them lockless, since the lockless operations won't respect the lock.
I don't know for sure, but here's a guess.
The stack could be implemented as a singly-linked list, consisting of immutable cells, where each cell points to the cell underneath it on the stack. Then a "snapshot" would simply be to make a copy of the top of the stack; because the cells are immutable, this pulls along the rest of the current stack as well.
Thus, the snapshot would be an O(1) operation, but counting the actual items is still O(n).
You're assuming that in order to count on a snapshot, it has to take out a big lock.
I believe the concurrent collections are designed with cheap - possibly optimistic - snapshotting in mind. For example, if you iterate over a ConcurrentStack
via GetEnumerator()
, that uses a snapshot too. I very much doubt that it's always an O(n) operation to create that snapshot.
I suppose the biggest reason is that asking a concurrent queue (where two or more threads are always manipulating the content) for a count of items is a flawed concept to begin with. I suspect they only provided the Count property to comply with existing interfaces. So if the only reason for an api to exist is to meet an interface requirement then who cares how it performs as you should not be using it to begin with.
When working with objects that are modified by multiple threads it's never ok to ask the object about some state/value that the other thread can change. Since by the time your question is answered it may no longer be the correct response. This is akin to the fundamental problem in WinForm's "InvokeRequired" property. Some APIs are just a bad idea in a multi-threaded environment, and I think this Count falls into that category.
Ultimately it is kinda weird that implementors didn't use an explicit interface member for the ICollection.Count property rather than making it public. Then at least you know that you shouldn't use it. They did specify to use IsEmpty on MSDN; however, it's easy to overlook that.
ConcurrentStack and ConcurrentQueue are thread safe collections, so no locking is involved. in ConcurrentStack for example, the count is calculated by reading the head node then traversing the list accumulation the count
精彩评论