开发者

How does IEnumerable<T>.ToArray() work?

开发者_JS百科

Is it a two-pass algorithm? i.e., it iterates the enumerable once to count the number of elements so that it can allocate the array, and then pass again to insert them?

Does it loop once, and keep resizing the array?

Or does it use an intermediate structure like a List (which probably internally resizes an array)?


It uses an intermediate structure. The actual type involved is a Buffer, which is an internal struct in the framework. In practice, this type has an array, that is copied each time it is full to allocate more space. This array starts with length of 4 (in .NET 4, it's an implementation detail that might change), so you might end up allocating and copying a lot when doing ToArray.

There is an optimization in place, though. If the source implementes ICollection<T>, it uses Count from that to allocate the correct size of array from the start.


First it checks to see if the source is an ICollection<T>, in which case it can call the source's ToArray() method.

Otherwise, it enumerates the source exactly once. As it enumerates it stores items into a buffer array. Whenever it hits the end of the buffer array it creates a new buffer of twice the size and copies in the old elements. Once the enumeration is finished it returns the buffer (if it's the exact right size) or copies the items from the buffer into an array of the exact right size.

Here's pseudo-source code for the operation:

public static T[] ToArray<T>(this IEnumerable<T> source)
{
    T[] items = null;
    int count = 0;

    foreach (T item in source)
    {
        if (items == null)
        {
            items = new T[4];
        }
        else if (items.Length == count)
        {
            T[] destinationArray = new T[count * 2];
            Array.Copy(items, 0, destinationArray, 0, count);
            items = destinationArray;
        }
        items[count] = item;
        count++;
    }

    if (items.Length == count)
    {
        return items;
    }
    T[] destinationArray = new TElement[count];
    Array.Copy(items, 0, destinationArray, 0, count);
    return destinationArray;
}


Like this (via .NET Reflector):

public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    Buffer<TSource> buffer = new Buffer<TSource>(source);
    return buffer.ToArray();
}

[StructLayout(LayoutKind.Sequential)]
internal struct Buffer<TElement>
{
    internal TElement[] items;
    internal int count;
    internal Buffer(IEnumerable<TElement> source)
    {
        TElement[] array = null;
        int length = 0;
        ICollection<TElement> is2 = source as ICollection<TElement>;
        if (is2 != null)
        {
            length = is2.Count;
            if (length > 0)
            {
                array = new TElement[length];
                is2.CopyTo(array, 0);
            }
        }
        else
        {
            foreach (TElement local in source)
            {
                if (array == null)
                {
                    array = new TElement[4];
                }
                else if (array.Length == length)
                {
                    TElement[] destinationArray = new TElement[length * 2];
                    Array.Copy(array, 0, destinationArray, 0, length);
                    array = destinationArray;
                }
                array[length] = local;
                length++;
            }
        }
        this.items = array;
        this.count = length;
    }

    internal TElement[] ToArray()
    {
        if (this.count == 0)
        {
            return new TElement[0];
        }
        if (this.items.Length == this.count)
        {
            return this.items;
        }
        TElement[] destinationArray = new TElement[this.count];
        Array.Copy(this.items, 0, destinationArray, 0, this.count);
        return destinationArray;
    }
}


In general, attempting to iterate an enumerable twice can lead to a disaster as there is no guarantee that the enumerable can be iterated a second time. Therefore, performing a Count and then allocate then copy is out.

In Reflector, it shows that it uses a type called Buffer that effectively streams the sequence into an array resizing (doubling on each reallocation so that the number of reallocations is O(log n)) as needed and then returning an appropriately sized array when it reaches the end


First, the items are loaded into an internal class Buffer<T> which allows the count to be generated

Next, Buffer<T>.ToArray is called, which does an Array.Copy of the Buffer<T>'s array into a returned array.

.NET Reflector shows this code if you want to see for yourself.

http://www.red-gate.com/products/reflector/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜