开发者

List<> Better to init with a maximum capacity and only use a fraction of that, or init with no capacity

I have a List<MyStruct> that I'm initializing to be empty, and 开发者_运维技巧I will be populating this struct in a loop as I parse the data. I know that there is a maximum possible number of entries that will be inserted into this list. For now lets say 1000. However after my parsing of the 1000 entries I may end up only putting 2 into the List. So should I initialize the List with a capacity of 1000 or don't specify a capacity and just add the few entries. It could however end up adding all 1000. What's the best way performance wise?


Doesn't really matter. Don't micro-optimize. Only set the capacity if you have a good idea it's roughly the amount you need. Under the hood, the list doubles every time it grows, so the number of growths is O(log(n)). It should be pretty efficient.


If it can truly vary that widely, then you would want to not set the capacity. For most collections, the capacity doubles as it is met (with a default capacity of 16 I believe), so your capacity will very closely approach your maximum as you fill it up.


First, you should just implement it in the most natural, maintainable and readable way. In this case, that is to just create a new List<T> (accepting the default capacity) and add your objects to it. Then what you do if your application is not meeting your performance specifications is you profile it. If it turns out through profiling that this is a bottleneck in your application then you try to optimize it. If your application is meeting your performance specifications or if this specific part is not a bottleneck, you ignore it.

Second, sometimes implementation details are important and here's a case where it is. The way that List<T> is implemented is a dynamically growable array the starts with a certain capacity and doubles the size each time that regrowth is needed. What this means is that if you are adding n object into a newly created list there will be O(log n) regrowths and you'll waste at most O(n) space. Unless memory is tight on your system (maybe you're running .NET CF on a mobile phone) this isn't that big of a deal. And from a performance perspective, the parsing of your entries is likely to consume significantly more time than the regrowth. Thus, this is also not likely to be a factor.


Considering that you're list is small to begin with, you're better off not initializing it. It'll make the code easier to read without any noticeable performance hit.


First of all let say I'm not in such a place to write an answer, i first came to find it, yet i'm writing one, just to suggest, and also get your opinion.

what a list does while adding data:

public void Add(T item) {
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size++] = item;
    _version++;
}

private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}

Looking at this, first it does exactly what some of you guys said, it doubles the capacity, and unlike some other may think, and also unlike the way Arrays work, it won't block user when it reaches the provided capacity.

And when does it increase Capacity? At this line: Capacity = newCapacity;; actually it's the Capacity property setter which performs the operations:

public int Capacity {
    get {
        Contract.Ensures(Contract.Result<int>() >= 0);
        return _items.Length;
    }
    set {
        if (value < _size) {
            ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.value, ExceptionResource.ArgumentOutOfRange_SmallCapacity);
        }
        Contract.EndContractBlock();

        if (value != _items.Length) {
            if (value > 0) {
                T[] newItems = new T[value];
                if (_size > 0) {
                    Array.Copy(_items, 0, newItems, 0, _size);
                }
                _items = newItems;
            }
            else {
                _items = _emptyArray;
            }
        }
    }
}

As it's obvious it's not a simple flag change operation to just let more item in, as what linked list will do (to be honest i always consider lists as LinkedList'. now i can say with list, i have better read performance, and less write performance (yet i'm not sure about what i'm saying, someone confirm if we should use LinkedList when we perform write and one time read operations...) ). so as we can see it creates a new array and copy items to new list one by one...

So here's my suggestion:

  1. As @jason said, we do not need to think of passing it a value, when we only may run our write operation once, into the list
  2. If the weight of list is small, for example just a few iteration of increasing the list size won't do much, for example as everyone said, if it's 2 become 4, and 8... first it's only triple time that we increase size, and secondly, we only copy a few amount of data. again we can ignore it no matter where the code is placed, or i hope so.
  3. But if you are copying few thousands of data from db, and it starts from start, 2->4->8->16->32->64->128->256->512->1024->2048->... until know we had 10 time increasing array size, and if we think a copy is just a single operation that copies reference, aside from the other few thing that need to be done in machine codes, we will have 4094 time of copying data from one array to another, and also consuming half that much of space that need to be wait for GC (in graphical application RAM may become matter but it's too much for me to write example for)... So multiplying this by number of operations that call such code at same time, the performance may reduce dramatically. So i may consider to do the following: if i know a number, for example i know i have x item, and these item may reference to 0~2 i may consider passing that x or x*2, and it only will grow once if needed. (Please tell me your opinion).

  4. In completion to idea No.3 The doubling seem to be reasonable per single list, and no matter what you do, you only can boost half as much of the time, and doing whole of the operation will only take ~two of those halves, so you may ignore it if you do not launch multiple threads/tasks at same time, or lots of list one after another.

I also just find out that: private const int _defaultCapacity = 4;


Note: that if you use maximum capacity, as it said, it store space equal to amount is needed for 2G elements (as it said: // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.), and that's not the amount you want to initialize your list with, even if your code run once, it look like too much of straight (linear/ side-by-side) data inside ram (as data structure thought us, if C# didn't done anything newer than what our books said) and the allocation also may require quite some time (i'm not aware of this process). so i never recommend it, if you do not know really that how many is required, and i think on such times we also should consider linked list, if the data are really linear, and there may be lot of space taken in ram in random places (if that's the case: that require lot of checking before the machine can find a place to allocate that space in).


Probably the best thing to do is compromise. Initialize the list to something like 256.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜