开发者

Performance of a get unique elements/group by operation on an IEnumerable<T>

I was wondering how could I improve the performance of the following code:

public class MyObject
{
    public int Year { get; set; }
}

//In my case I have 30000
IEnumerable<MyObject> data = MethodThatReturnsManyMyObjects(); 

var groupedByYear = data.GroupBy(x => x.Year); 

//Here is the where it takes around 5 seconds
foreach (var group in groupedByYear) 
    //do something here.

The idea is to get a set of objects with unique year values. In my scenario there are only 6 years included in the 30000 items in the list so the foreach loop will be executed 6 times only. So we have many items needing to be grouped in a few groups.

Using the .Distinct() with an explicit IEqualityComparer would be an alternative but somehow I feel that it wont make any difference.

I can understand if 30000 items is too much and that i should be happy with the 5 seconds I get, but I was wondering if the above can be imporved performance wise.

Thanks.

EDIT: The answers below made me dig a bit deeper only to realize that the 5 seconds I am getting only heppens when the data are loaded into memory from the DB. The delay was disguised inside the foreach loop as th开发者_StackOverflow社区e deferred execution of the IEnumerable delayed it until that point confusing me to assume that probably the GroupBy() could be refactored to something more performant.

The question still stands though, is the GroupBy() command the optimal way to achieve the best performance in such cases?


That definitely shouldn't take that long. Is this running under the debugger, or not? Are any exceptions being thrown? Does the Year property perform any calculations in real life? It should execute this almost instantly, to be honest.

Do you have a short but complete program which demonstrates it taking a long time? (If not, I'll try to come up with one myself to get some sample timings.)

Note that if MethodThatReturnsManyMyObjects is using deferred execution for the iterator, that could be the culprit - how long does it take if you call data.ToList() for example?


I'm curious to know: does your MethodThatReturnsManyMyObjects provide lazy evaluation (i.e., using the yield keyword)? If so, that could be your culprit, rather than the call to GroupBy:

// if MethodThatReturnsManyMyObjects uses yield, then
// it won't be executed until enumeration
IEnumerable<MyObject> data = MethodThatReturnsManyMyObjects();

// still not executed
var groupedByYear = data.GroupBy(x => x.Year); 

// finally executed here
foreach (var group in groupedByYear)
    // ...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜