Intel Thread Building Blocks Concurrent Queue: Using pop() over pop_if_present()
What is the difference in using the blocking call pop()
as compared to,
while(pop_if_present(...))
Which should be preferred over the other? And why?
I am looking for a deeper understanding of th开发者_如何学Goe tradeoff between polling yourself as in the case of while(pop_if_present(...))
with respect to letting the system doing it for you. This is quite a general theme. For example, with boost::asio
I could do a myIO.run()
which blocks or do the following:
while(1)
{
myIO.poll()
}
One possible explanation is is that the thread that invokes while(pop_if_present(...))
will remain busy so this is bad. But someone or something has to poll for the async event. Why and how can this be cheaper when it is delegated to the OS or the library? Is it because the OS or the library smart about polling for example do an exponential backoff?
Intel's TBB library is open source, so I took a look...
It looks like pop_if_present()
essentially checks if the queue is empty and returns immediately if it is. If not, it attempts to get the element on the top of the queue (which might fail, since another thread may have come along and taken it). If it misses, it performs an "atomic_backoff
" pause before checking again. The atomic_backoff
will simply spin the first few times it's called (doubling its spin loop count each time), but after a certain number of pauses it'll just yield to the OS scheduler instead of spinning on the assumption that since it's been waiting a while, it might as well do it nicely.
For the plain pop()
function, if there isn't anything in the queue will perform atomic_backoff
waits until there is something in the queue that it gets.
Note that there are at least 2 interesting things (to me anyway) about this:
the
pop()
function performs spin waits (up to a point) for something to show up in the queue; it's not going to yield to the OS unless it has to wait for more than a little short moment. So as you might expect, there's not much reason to spin yourself callingpop_if_present()
unless you have something else you're going to do between calls topop_if_present()
when
pop()
does yield to the OS, it does so by simply giving up it's time slice. It doesn't block the thread on a synchronization object that can be signaled when an item is placed on the queue - it seems to go into a sleep/poll cycle to check the queue for something to pop. This surprised me a little.
Take this analysis with a grain of salt... The source I used for this analysis might be a bit old (it's actually from concurrent_queue_v2.h and .cpp) because the more recent concurrent_queue has a different API - there's no pop()
or pop_if_present()
, just a try_pop()
function in the latest class concurrent_queue
interface. The old interface has been moved (possibly changed somewhat) to the concurrent_bounded_queue
class. It appears that the newer concurrent_queues can be configured when the library is built to use OS synchronization objects instead of busy waits and polling.
With the while(pop_if_present(...))
you are doing brute-force busy wait (also called spinning) on the queue. When the queue is empty you waste cycles by keeping CPU busy until either an item is pushed into the queue by another thread running on different CPU, or OS deciding to give your CPU to some other, possibly unrelated thread/process.
You can see how this could be bad if you have only one CPU - the producer thread would not be able to push and thus stop the consumer spinning until at least the end of consumer's time quanta plus overhead of a context switch. Clearly a mistake.
With multiple CPUs this might be better if the OS selects (or you enforce) the producer thread to run on different CPU. This is the basic idea of spin-lock - a synchronization primitive built directly on special processor instructions such as compare-and-swap or load-linked/store conditional and commonly used inside the operating system to communicate between interrupt handlers and rest of the kernel, and to build higher level constructs such as semaphores.
With blocking pop()
, if queue is empty, you are entering sleep wait, i.e. asking the OS to put the consumer thread into non-schedulable state until an event - push onto the queue - occurs form another thread. The key here is that the processor is available for other (hopefully useful) work. The TBB implementation actually tries hard to avoid the sleep since it's expensive (entering the kernel, rescheduling, etc.) The goal is to optimize the normal case where the queue is not empty and the item can be retrieved quickly.
The choice is really simple though - always sleep-wait, i.e. do blocking pop()
, unless you have to busy-wait (and that is in real-time systems, OS interrupt context, and some very specialized applications.)
Hope this helps a bit.
精彩评论