How best to quickly populate a vector?
I have some simulation code I'm working on and I开发者_如何学C've just gotten rid of all the low hanging fruit so far as optimisation is concerned. The code now spends half its time pushing back vectors. (The size of the final vectors is known and I reserve appropriately)
Essentially I'm rearranging one vector into a permutation of another, or populating the vector with random elements.
Is there any faster means of pushing back into a vector? Or pushing back/copying multiple elements?
std::vector<unsigned int, std::allocator<unsigned int> >::push_back(unsigned int const&)
Thanks in advance.
EDIT: Extra info; I'm running a release build with -O3, also: the original vector needs to be preserved.
You can have a look at
- c++0x (which enables a lot of optimizations in this area in the concept of
move semantics
) - EASTL (which boasts superior performance, mainly through the use of custom allocators (_you can get it up and running in about an hour, and the only visible change will be
std::vector
-->eastl::vector
and some extra link objects). - you can drop in google perftools tcmalloc (although since apparently you already optimize by pre-allocating, this shouldn't really matter).
I'd personally not expect much gain if really the vector handling is the bottleneck. I'd really look at parallelizing with (in order of preference):
- GNU openmp (
CPPFLAGS+=-D_GLIBCXX_PARALLEL -fopenmp
) - just openmp and 'manual'
#pragma parallel for
- Intel TBB (most appropriate if using Intel compiler)
I must be forgetting stuff. O yeah, look here: http://www.agner.org/optimize/
Edit: I always forget the simplest things: Use memcpy/memmove for bulk appending POD elements to pre-allocated vectors.
If you're pre-reserving space then your vector is as fast as an array. You cannot mathematically make it faster; stop worrying and move on to something else!
You may be experiencing slow-down if you're running a "debug build" i.e. where your standard library implementation has optimisations turned off, and debug tracking info turned on.
push_back
on int
is extremely efficient. So I would look elsewhere for optimization opportunities.
Nemo's first rule of micro-optimization: Math is fast; memory is slow. Creating a huge vector is very cache-unfriendly.
For example, instead of creating a permutation of the original vector, can you just compute which element you need as you need it and then access that element directly from the original vector?
Similarly, do you really need a vector of random integers? Why not just generate a random number when it is needed? (If you have to remember it for later, then go ahead and push it onto the vector then... But not before.)
push_back
on int
is about as fast as it is going to get. I would bet you could barely notice the difference even if you got rid of the reserve
(because the re-allocation does not happen often and is going to use a very fast bulk copy already). So you need to take a broader view to improve performance.
If you have multiple vectors, you may be able to improve speed by allocating them continuously using a custom allocator. Improving memory locality may well improve the running time of the algorithm.
If you are using Debug version of STL there is a debug overhead (esp. in iterators) in all STL calls.
I would advice to replace STL vector with regular array. If you are using trivially-copyable types you could easily copy multiple elements using memcpy
call.
精彩评论