Slow down associated with vector::push_back and placement new in a multithreaded application
I have a multithreaded application in which my thread utilization is very poor (in the ball park of 1%-4% per thread, with fewer threads than processors). In the debugger, it appears to be spending a lot of time in vector::开发者_开发技巧push_back, specifically the placement new that occurs during the push_back. I've tried using reserve to avoid having the vector expand its capacity and copy everything, but that doesn't appear to be the problem. Commenting out the vector::push_backs leads to much better thread utilization.
This problem is occurring with vectors of uint64_t, so it does not appear to be the result of complicated object construction. I have tried using both the standard allocator and a custom allocator and both perform the same way. The vectors are being used by the same thread that allocated them.
Unless you need these initialized to 0, consider writing a vector-like class which does not initialize. I've found this to provide measurable performance gains in some scenarios.
Side note: When your profiler claims you're spending most your time with primitive operations on 64-bit integers, you know the rest of your code is optimized decently.
Maybe something trivial that won't really work, but as the push_back
calls create an new item, why not initialize the vector to all 0's, and access the elements with something like at
or operator[]
. That should get rid of any lock on the vector.
Does the thread utilization improve if you only use one thread? If so, perhaps you are running afoul of some sort of heap lock, eg
In multithreaded C/C++, does malloc/new lock the heap when allocating memory
http://msdn.microsoft.com/en-us/library/ms810466.aspx
精彩评论