开发者

How does the capacity of std::vector grow automatically? What is the rate?

I had been going through the Book: C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:

#include <vector>
#include <iostream>

using namespace std;

int main() {
    vector<int> ivec;
    cout < "ivec: size: " < ivec.size() < " capacity: "  < ivec.capacity() < endl;
    
    for (int ix = 0; ix < 24; ++ix) {
        ivec.push_back(ix);
        cout < "ivec: size: " < ivec.size()
        < " capacity: "  < ivec.capacity() < endl;
    }    
}

Later within that article:

"Under 开发者_开发问答the Rogue Wave implementation, both the size and the capacity of ivec after its definition are 0. On inserting the first element, however, ivec's capacity is 256 and its size is 1."

But, on correcting and running the code i get the following output:


ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32

Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.


The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.

Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.

This probably seems a bit odd, so let me explain to you how this works...

  • size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
  • size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
  • size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
  • size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
  • size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
  • ...
  • size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
  • size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
  • ...
  • size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
  • size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82

As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.

If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).

I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.


To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.

Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.

(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).


Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.

Grow by factor of K

  y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y =     + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)

K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)

T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)

K/(K-1) is a constant, and see the most common cases:

  • K=2, T(n) = 2/(2-1) = 2
  • K=1.5, T(n) = 1.5/(1.5-1) = 3

and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory

Grow by constant quantity of C

y = C + 2*C + 3*C + 4*C +  ... (n/C) * C
  = C(1+2+3+...+n/C), say m = n/C
  = C*(m*(m-1)/2)
  = n(m-1)/2

T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)

As we could see it is liner


The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..


Are you using the "Rogue Wave" implementation?

How capacity grows is up to the implementation. Yours use 2^N.


Yes, the capacity doubles each time it is exceeded. This is implementation dependent.


before pushing back an element the vector check if the size is greater than it's capacity like bellow

i will explain it with reserve function :


void    push_back(const value_type &val)   //push_back actual prototype
{
   if (size_type < 10)
        reserve(size_type + 1);
   else if (size_type > (_capacity / 4 * 3))
        reserve(_capacity + (this->_capacity / 4));
   //then the vector get filled with value
}

size_type : the vector size.

_capacity : the vector _capacity.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜