C++ Producer consumer queue with (very) fast and reliable handover
Hi I am looking into having thread handover using a fast and reliable producer consumer queue. I am working on Windows with VC++.
I based my design on Anthony Williams queue, that is, basically a boos开发者_如何学编程t::mutex with a boost::condition_variable. Now typically the time between notify_one() and waking up varies between 10 (rare) and 100 microseconds, with most values in the area of 50 microseconds. However, about 1 in a 1000 goes over one millisecond, with some taking longer than 5 milliseconds.
I was just wondering whether these are typical values? Is there a faster way of signalling short of spinning? Is it from here all down to managing thread priorities? I haven't started playing with priorities, but I was just wondering whether there is a chance of getting this into a fairly stable region of about 10 microseconds?
Thanks
EDIT: With SetPriorityClass(GetCurrentProcess(),REALTIME_PRIORITY_CLASS), the average wake time is still roughly 50 micros, however there are a lot fewer outliers, most of them are around 150-200 micros now. Except for one freak outlier at 7 ms. Hmmm... not good.
One way amortise the overhead of locking and thread wakeup is to add a second queue and implement a double-buffering approach. This enables batch-processing at the consumer side:
template<typename F>
std::size_t consume_all(F&& f)
{
// minimize the scope of the lock
{
std::lock_guard<std::mutex> lock(the_mutex);
std::swap(the_queue, the_queue2);
}
// process all items from the_queue2 in batch
for (auto& item : the_queue2)
{
f(item);
}
auto result = the_queue2.size();
the_queue2.clear(); // clears the queue and preserves the memory. perfect!
return result;
}
Working sample code.
This does not fix the latency issue, but it can improve throughput. If a hiccup occurs then the consumer will be presented with a large batch which can then be processed at full speed without any locking overhead. This allows the consumer to quickly catch up with the producer.
There are lots of things that might cause problems.
You probably need to try to profile the app and see where slowdowns might be occurring.
Some notes:
- Are the consumer and producer in the same process? If so, a Critical Section is much faster than a Mutex.
- Try to ensure all the queue memory is in in current memory. If you have to swap pages that will slow right down.
- Be very careful setting your process to real time priority. That is supposed to be for system processes. If the process does too much work, it can prevent a critical system process getting cpu, which can end very badly. Unless you absolutely need real time, just use HIGH_PRIORITY_CLASS
The short answer is yes, from there it really is down to operating system management, and thread scheduling. RTSs (Real time systems) can bring those 50 micros to about 15 micros, and more importantly, they can get rid of the outliers. Otherwise spinning is the only answer. If there are more queues than cores, the idea might be to have x number of threads spinning, to react immediately, and the remaining blocking. That would involve some kind of "master" queue thread, constantly spinning to check all queues, and - either processing items itself - or handing them over to worker threads, of which some could also be spinning to save those 50 micros. It gets complicated, though.
Probably best would be to just use a single lock-free multiple-producer-single-consumer queue with a spinning consumer thread. Then all items that go into the queues would probably need to derive from a common base type, and would need to contain some meta-info as to what to do with the items.
Complicated, but possible. If I ever set it up, I might post some code as well.
精彩评论