Intel Thread Building Blocks Concurrent Queue: Using pop() over pop_if_present()

2022-12-20 23:29 问答作者：

What is the difference in using the blocking call pop() as compared to,

while(pop_if_present(...))

Which should be preferred over the other? And why?

I am looking for a deeper understanding of th开发者_如何学Goe tradeoff between polling yourself as in the case of while(pop_if_present(...)) with respect to letting the system doing it for you. This is quite a general theme. For example, with boost::asio I could do a myIO.run() which blocks or do the following:

while(1) 
{
myIO.poll()
}

One possible explanation is is that the thread that invokes while(pop_if_present(...)) will remain busy so this is bad. But someone or something has to poll for the async event. Why and how can this be cheaper when it is delegated to the OS or the library? Is it because the OS or the library smart about polling for example do an exponential backoff?

Intel's TBB library is open source, so I took a look...

It looks like pop_if_present() essentially checks if the queue is empty and returns immediately if it is. If not, it attempts to get the element on the top of the queue (which might fail, since another thread may have come along and taken it). If it misses, it performs an "atomic_backoff" pause before checking again. The atomic_backoff will simply spin the first few times it's called (doubling its spin loop count each time), but after a certain number of pauses it'll just yield to the OS scheduler instead of spinning on the assumption that since it's been waiting a while, it might as well do it nicely.

For the plain pop() function, if there isn't anything in the queue will perform atomic_backoff waits until there is something in the queue that it gets.

Note that there are at least 2 interesting things (to me anyway) about this:

the pop() function performs spin waits (up to a point) for something to show up in the queue; it's not going to yield to the OS unless it has to wait for more than a little short moment. So as you might expect, there's not much reason to spin yourself calling pop_if_present() unless you have something else you're going to do between calls to pop_if_present()
when pop() does yield to the OS, it does so by simply giving up it's time slice. It doesn't block the thread on a synchronization object that can be signaled when an item is placed on the queue - it seems to go into a sleep/poll cycle to check the queue for something to pop. This surprised me a little.

Take this analysis with a grain of salt... The source I used for this analysis might be a bit old (it's actually from concurrent_queue_v2.h and .cpp) because the more recent concurrent_queue has a different API - there's no pop() or pop_if_present(), just a try_pop() function in the latest class concurrent_queue interface. The old interface has been moved (possibly changed somewhat) to the concurrent_bounded_queue class. It appears that the newer concurrent_queues can be configured when the library is built to use OS synchronization objects instead of busy waits and polling.

With the while(pop_if_present(...)) you are doing brute-force busy wait (also called spinning) on the queue. When the queue is empty you waste cycles by keeping CPU busy until either an item is pushed into the queue by another thread running on different CPU, or OS deciding to give your CPU to some other, possibly unrelated thread/process.

You can see how this could be bad if you have only one CPU - the producer thread would not be able to push and thus stop the consumer spinning until at least the end of consumer's time quanta plus overhead of a context switch. Clearly a mistake.

With multiple CPUs this might be better if the OS selects (or you enforce) the producer thread to run on different CPU. This is the basic idea of spin-lock - a synchronization primitive built directly on special processor instructions such as compare-and-swap or load-linked/store conditional and commonly used inside the operating system to communicate between interrupt handlers and rest of the kernel, and to build higher level constructs such as semaphores.

With blocking pop(), if queue is empty, you are entering sleep wait, i.e. asking the OS to put the consumer thread into non-schedulable state until an event - push onto the queue - occurs form another thread. The key here is that the processor is available for other (hopefully useful) work. The TBB implementation actually tries hard to avoid the sleep since it's expensive (entering the kernel, rescheduling, etc.) The goal is to optimize the normal case where the queue is not empty and the item can be retrieved quickly.

The choice is really simple though - always sleep-wait, i.e. do blocking pop(), unless you have to busy-wait (and that is in real-time systems, OS interrupt context, and some very specialized applications.)

Hope this helps a bit.

继续阅读：concurrency multithreading

Intel Thread Building Blocks Concurrent Queue: Using pop() over pop_if_present()

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？