What's generally the size limit to switch from a vector to a deque?
I recent wrote this post:
How best to store VERY large 2D list of floats in c++? Error-handling?Some suggested that I implemented my 2D list-like struct开发者_运维问答ure of floats as a vector, others said a deque.
From what I gather vector requires continuous memory, but is hence more efficient. Obviously this would be desirable if possible.
Thus my question is what's a good rule of how long a basic structure can be in terms of...
1. float 2. int...before you should switch from a vector to a deque to avoid memory problems?
e.g. I'm looking for answer like "At around 4 million floats or 8 million ints, you should switch..." ...if possible.
Well, here are two opinions. The C++ standard says (23.1.1/2):
vector
is the type of sequence that should be used by default.
list
should be used when there are frequent insertions and deletions from the middle of the sequence.
deque
is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.
Herb Sutter argues the following (the article contains his rationale and a performance analysis):
I'd like to present an amiably dissenting point of view: I recommend that you consider preferring deque by default instead of vector, especially when the contained type is a class or struct and not a builtin type, unless you really need the container's memory to be contiguous.
Again, there is no size limit above which deque is or not better than vector. Memory fragmentation implications are pretty much the same in either case, except when you have already done a huge load of allocations/deallocations and there is not enough contiguous space left for a big vector. But this case is very rare. Remember that memory space is per process (google for virtual memory). And you can remedy it by allocating the memory for the vector (by the reserve
method) before the cluttering takes place.
The tradeoff is in term of what you want to do with it. If the structure is basically immutable and you only want to access it / overwrite it by index access, go for vector.
Deque is when you need to do insertions either at the end, the beginning or in the middle, something vector cannot handle naturally (except for inserting at the end).
Herb Sutter's articles are in general of great quality, but you'll notice that when you do "number crunching" in C++, most of the stuff you're taught in "general C++" books must be taken with an extra word of caution. The poor indexing performance you experience with deques is perhaps important for your application. In this case, don't use deque.
If you need insertions at the beginning, then go with deque.
Otherwise, I always like to point to this article on vector vs. deque (in addition to those linked by James McNellis here). Assuming an implementation of deque that uses page-based allocation, this article has good comparisons of allocation time (& deallocation time) for vector with & without reserve() vs. deque. Basically, using reserve() makes vector allocation time very similar to deque. Informative, and useful if you can guess the right value to reserve ahead of time.
There are so many factors to consider that it's impossible to give a clear answer. The amount of memory on the machine, how fragmented it is, how fragmented it may become, etc. My suggestion is to just choose one and use it. If it causes problems switch. Chances are you aren't going to hit those edge cases anyway.
If you are truly worried, then you could probably implement a sort of pseudo PIMPL:
template<typename T>
class VectorDeque
{
private:
enum TYPE { NONE, DEQUE, VECTOR };
std::deque<T> m_d;
std::vector<T> m_v;
TYPE m_type;
...
public:
void resize(size_t n)
{
switch(m_type)
{
case NONE:
try
{
m_v.resize(n);
m_type = VECTOR;
}
catch(std::bad_alloc &ba)
{
m_d.resize(n);
m_type = DEQUE;
}
break;
}
}
};
But this seems like total overkill. Have you done any benchmarking or tests? Do you have any reason to believe that memory allocations will fail or that deques will be too slow?
You switch after testing and profiling indicate that one is worse than the other for your application. There is no "around N floats or M ints" universal answer.
Well, regarding memory, I can share some experience that may help you decide when contiguous memory blocks (malloc or std::vector) may become too large:
The application I work with does record measurement data, mostly 4byte float
, and for this it allocates internal buffers to store the data. These buffers heavily vary in size, but the typical range may be say, several dozen of 1-10MB and a very few of >100MB. The buffers are allways allocated with calloc
, i.e. one big chunk of memory. If a buffer-allocation fails, an error is logged and the user has the choice to try again.
Buffer sizes: Say you want to record 1000 channels at 100Hz for 10 Minutes: 4byte x 1000 x 100 x 60x10 == 228 MB (approx.) ... or 100 channels at 10Hz for 12 hours == 41 MB
We (nearly) never had any problems allocating 40MB buffers (and that's about 10 millon floats) and the 200-300 MB buffers fail from time to time -- all this on normal WinXP/32bit boxes with 4GB RAM.
Given that you don't insert after creation, you should probably either use plain old std::vector
, or if fragmentation really does become an issue, a custom vector-like Sequence implemented as a vector or array of pointers to fixed-size arrays.
精彩评论