开发者

Conversion of an IEnumerable to a dictionary for performance?

I have recently seen a new trend in my firm where we change the IEnumerable to a dictionary by a simple LINQ transformation as follows:

enumerable.ToDictionary(x=>x);

We mostly end up doing this when the operation on the collection is a Contains/Access and obviously a dictionary has a better performance in such cases.

But I realise that converting the Enumerable to a dictionary has its own cost and I am wondering at what point does it start to break-even (if it does) i.e the performance of IEnumerable Contains/Access is equal to ToDictionary + access/contains.

Ok I might add there is no databse access the enumerable might be created from a database query and thats it and the enumerable may be edited after that too..

Also it would be interesting to 开发者_开发技巧know how does the datatype of the key affect the performance?

The lookup might be 2-5 times generally but sometimes may be one too. But i have seen things like For an enumerable:

 var element=Enumerable.SingleorDefault(x=>x.Id);
 //do something if element is null or return

for a dictionary:

 if(dictionary.ContainsKey(x))
 //do something if false else  return

This has been bugging me for quite some time now.


Performance of Dictionary Compared to IEnumerable

A Dictionary, when used correctly, is always faster to read from (except in cases where the data set is very small, e.g. 10 items). There can be overhead when creating it.

Given m as the amount of lookups performed against the same object (these are approximate):

  • Performance of an IEnumerable (created from a clean list): O(mn)
    • This is because you need to look at all the items each time (essentially m * O(n)).
  • Performance of a Dictionary: O(n) + O(1m), or O(m + n)
    • This is because you need to insert items first (O(n)).

In general it can be seen that the Dictionary wins when m > 1, and the IEnumerable wins when m = 1 or m = 0.

In general you should:

  • Use a Dictionary when doing the lookup more than once against the same dataset.
  • Use an IEnumerable when doing the lookup one.
  • Use an IEnumerable when the data-set could be too large to fit into memory.
    • Keep in mind a SQL table can be used like a Dictionary, so you could use that to offset the memory pressure.

Further Considerations

Dictionarys use GetHashCode() to organise their internal state. The performance of a Dictionary is strongly-related to the hash code in two ways.

  • Poorly performing GetHashCode() - results in overhead every time an item is added, looked up, or deleted.
  • Low quality hash codes - results in the dictionary not having O(1) lookup performance.

Most built-in .Net types (especially the value types) have very good hashing algorithms. However, with list-like types (e.g. string) GetHashCode() has O(n) performance - because it needs to iterate over the whole string. Thus you dictionary's performance can really be seen as (where M is the big-oh for an efficient GetHashCode()): O(1) + M.


It depends....

How long is the IEnumerable?

Does accessing the IEnumerable cause database access?

How often is it accessed?

The best thing to do would be to experiment and profile.


If you searching elements in your collection by some key very often - definatelly the Dictionary will be faster because or it's hash-based collection and searching is faster in times, otherwise if you don't search a lot thru the collection - the convertion is not necessary, because time for conversion may be bigger than you one or two searches in the collection,


IMHO: you need to measure this on your environment with representative data. In such cases I just write a quick console app that measures the time of the code execution. To have a better measure you need to execute the same code multiple times I guess.

ADD:

It also depents on the application you develop. Usually you gain more in optimizing other places (avoiding networkroundrips, caching etc.) in that time and effort.


I'll add that you haven't told us what happens every time you "rewind" your IEnumerable<>. Is it directly backed by a data collection? (for example a List<>) or is it calculated "on the fly"? If it's the first, and for small collections, enumerating them to find the wanted element is faster (a Dictionary for 3/4 elements is useless. If you want I can build some benchmark to find the breaking point). If it's the second then you have to consider if "caching" the IEnumerable<> in a collection is a good idea. If it's, then you can choose between a List<> or a Dictionary<>, and we return to point 1. Is the IEnumerable small or big? And there is a third problem: if the collection isn't backed, but it's too big for memory, then clearly you can't put it in a Dictionary<>. Then perhaps it's time to make the SQL work for you :-)

I'll add that "failures" have their cost: in a List<> if you try to find an element that doesn't exist, the cost is O(n), while in a Dictionary<> the cost is still O(1).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜