List<T> and ArrayList default capacity
I have been looking at .NET's List and ArrayList implementations with Reflector.
When looking at the Add(T item) I ran across this.EnsureCapacity(this._size + 1):
public void Add(T item)
{
if (this.开发者_JAVA百科_size == this._items.Length)
{
this.EnsureCapacity(this._size + 1);
}
this._items[this._size++] = item;
this._version++;
}
So EnsureCapacity looks like this:
private void EnsureCapacity(int min)
{
if (this._items.Length < min)
{
int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
if (num < min)
{
num = min;
}
this.Capacity = num;
}
}
Why does the internal Capacity default to 4 and then increment in multiples of 2(ie: double)?
Regarding doubling when resizing is needed it is for the following reason. Let's say that you want to insert n
items into a List
. We will resize the list no more than log n
times. Therefore, inserting n
items into a List
will take worst-case O(n)
time so insertions are constant in amortized time. Further, the amount of wasted space is bounded above by n
. Any strategy of constant proportional growth will result in constant amortized insertion time and linear wasted space. Growing faster than constant proportional growth could result in faster inserts but at the expense of more wasted space. Growing slower than constant proportional growth could result in less wasted space but at the expense of slower inserts.
The default capacity is so that little memory is wasted (and there is no harm in starting small as the doubling-resizing strategy is good from a time/space perspective as we just saw).
4 is a good default, as most collections will only have a few items in them. The incrementing is done to ensure that you don't go doing memory allocations every time you add an item.
See this good article by Joel on memory usage and why allocating double what you need is a good idea.
http://www.joelonsoftware.com/printerFriendly/articles/fog0000000319.html
Here's the relevant quote:
Suppose you wrote a smart strcat function that reallocates the destination buffer automatically. Should it always reallocate it to the exact size needed? My teacher and mentor Stan Eisenstat suggests that when you call realloc, you should always double the size of memory that was previously allocated. That means that you never have to call realloc more than lg n times, which has decent performance characteristics even for huge strings, and you never waste more than 50% of your memory.
As an aside, the list<> and dictionary<> now default to 10, but I would bet they have the same incrementing logic.
I bet this is so you can create small lists without multiple allocations. The doubling of size is for simplicity rather than having a complex scaling algorithm.
Seems like 4 is a reasonable trade off between large enough to accomodate frequent scenarios of having 4 items or less, and not having too many wasted items.
The capacity doubles for each allocation increase, ensuring that it can hold double the number of items already in the container. This is a similar algorithm to C++'s vector container.
精彩评论