How does the capacity of std::vector grow automatically? What is the rate?

2023-02-15 07:45 问答作者：

I had been going through the Book: C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:

#include <vector>
#include <iostream>

using namespace std;

int main() {
    vector<int> ivec;
    cout < "ivec: size: " < ivec.size() < " capacity: "  < ivec.capacity() < endl;
    
    for (int ix = 0; ix < 24; ++ix) {
        ivec.push_back(ix);
        cout < "ivec: size: " < ivec.size()
        < " capacity: "  < ivec.capacity() < endl;
    }    
}

Later within that article:

"Under 开发者_开发问答the Rogue Wave implementation, both the size and the capacity of ivec after its definition are 0. On inserting the first element, however, ivec's capacity is 256 and its size is 1."

But, on correcting and running the code i get the following output:

ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32

Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.

The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.

Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.

This probably seems a bit odd, so let me explain to you how this works...

size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
...
size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
...
size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82

As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.

If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).

I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.

To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.

Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.

(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).

Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.

Grow by factor of K

  y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y =     + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)

K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)

T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)

K/(K-1) is a constant, and see the most common cases:

K=2, T(n) = 2/(2-1) = 2
K=1.5, T(n) = 1.5/(1.5-1) = 3

and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory

Grow by constant quantity of C

y = C + 2*C + 3*C + 4*C +  ... (n/C) * C
  = C(1+2+3+...+n/C), say m = n/C
  = C*(m*(m-1)/2)
  = n(m-1)/2

T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)

As we could see it is liner

The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..

Are you using the "Rogue Wave" implementation?

How capacity grows is up to the implementation. Yours use 2^N.

Yes, the capacity doubles each time it is exceeded. This is implementation dependent.

before pushing back an element the vector check if the size is greater than it's capacity like bellow

i will explain it with reserve function :


void    push_back(const value_type &val)   //push_back actual prototype
{
   if (size_type < 10)
        reserve(size_type + 1);
   else if (size_type > (_capacity / 4 * 3))
        reserve(_capacity + (this->_capacity / 4));
   //then the vector get filled with value
}

size_type : the vector size.

_capacity : the vector _capacity.

继续阅读：stdvector

How does the capacity of std::vector grow automatically? What is the rate?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？