Why is Dictionary.First() so slow?

2023-01-03 08:31 问答作者：

Not a real question because I already found out the answer, but still interesting thing.

I always thought that hash table is the fastest associative container if you hash properly.

However, the following code is terribly slow. It executes 开发者_C百科only about 1 million iterations and takes more than 2 minutes of time on a Core 2 CPU.

The code does the following: it maintains the collection todo of items it needs to process. At each iteration it takes an item from this collection (doesn't matter which item), deletes it, processes it if it wasn't processed (possibly adding more items to process), and repeats this until there are no items to process.

The culprit seems to be the Dictionary.Keys.First() operation.

The question is why is it slow?

Stopwatch watch = new Stopwatch();
watch.Start();

HashSet<int> processed = new HashSet<int>();
Dictionary<int, int> todo = new Dictionary<int, int>();

todo.Add(1, 1);
int iterations = 0;

int limit = 500000;
while (todo.Count > 0)
{
    iterations++;
    var key = todo.Keys.First();
    var value = todo[key];
    todo.Remove(key);
    if (!processed.Contains(key))
    {
        processed.Add(key);
        // process item here
        if (key < limit) { todo[key + 13] = value + 1; todo[key + 7] = value + 1; }
        // doesn't matter much how
    }
}
Console.WriteLine("Iterations: {0}; Time: {1}.", iterations, watch.Elapsed);

This results in:

Iterations: 923007; Time: 00:02:09.8414388.

Simply changing Dictionary to SortedDictionary yields:

Iterations: 499976; Time: 00:00:00.4451514.

300 times faster while having only 2 times less iterations.

The same happens in java. Used HashMap instead of Dictionary and keySet().iterator().next() instead of Keys.First().

Dictionary<TKey, TValue> maintains a hash table.

Its enumerator will loop through the buckets in the hash table until it finds a non-empty bucket, then return the value in that bucket.
Once the dictionary grows large, this operation becomes expensive.
In addition, removing an item from the dictionary doesn't shrink the buckets array, so the First() call gets slower as you remove items. (Because it has to loop further to find a non-empty bucket)

Therefore, repeatedly calling First() and removing is O(n²).

By the way, you can avoid the value lookup like this: (This will not make it noticeably faster)

var kvp = todo.First();

//Use kvp.Key and kcp.Value

Dictionary makes no effort to keep track of a list of keys. So the iterator needs to walk the buckets. Many of these buckets, particularly for a large dictionary, many not have anything in them.

It may be helpful to compare OpenJDK's HashIterator.nextEntry and PrivateEntryIterator.nextEntry (which uses TreeMap.successor). The hash version walks an unknown number of entries looking for one that's non-null. This could be particularly slow if the hash table has had many elements removed (which it has in your case). In TreeMap, the only walking we do is our in-order traversal. There are no nulls in the way (only at the leaves).

Reflector shows that Dictionary<TKey, TValue> maintains a Entry<TKey, TValue> array that it's KeyCollection<TKey, TValue>.Enumerator<TKey, TValue> uses. Normally, the lookup should be relatively fast, as it can just index into the array (assuming you don't want a sorted First):

// Dictionary<TKey. TValue>
private Entry<TKey, TValue>[] entries;

However, if you're removing the first elements of that array, then you end up walking the array until you find a non-empty one:

// Dictionary<TKey, TValue>.KeyCollection<TKey, TValue>.Enumerator<TKey, TValue>
while (this.index < this.dictionary.count) {
    if (this.dictionary.entries[this.index].hashCode >= 0) {
        this.currentKey = this.dictionary.entries[this.index].key;
        this.index++;
        return true;
    }
    this.index++;
}

As you remove your entries, you start getting more and more empties at the front of the entries array, and it becomes slower to retrieve First next time.

Well, Hash Tables aren't sorted, my guess is it it has to do some sort of sort before it can do an iteration, or some sort of scan, if its already sorted, it can just loop through.

Without looking, the simplest implementation of a sorted dictionary is a sorted list (like TreeSet) of keys and a hash combined; the list gives you the ordering, the dictionary gives you values. Thus the keys are already available. Hashtable does not have keys readily available, thus the culprit is not first, it's keys (all without any shred of evidence, feel free to test the hypothesis ;D )

继续阅读：.net algorithm hashtable performance

Why is Dictionary.First() so slow?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？