Does the order of LINQ functions matter?
Basically, as the question states... does the order of LINQ functions matter in terms of performance? Obviously the results would have to be identical still...
Example:
myCollection.OrderBy(item => item.CreatedDate).Where(item => item.Code > 3);
myCollection.Where(item => item.Code > 3).OrderBy(item => item.CreatedDate);
Both return me the same results, but are in a different LINQ order. I realize that reordering some i开发者_开发问答tems will result in different results, and I'm not concerned about those. What my main concern is in knowing if, in getting the same results, ordering can impact performance. And, not just on the 2 LINQ calls I made (OrderBy, Where), but on any LINQ calls.
It will depend on the LINQ provider in use. For LINQ to Objects, that could certainly make a huge difference. Assume we've actually got:
var query = myCollection.OrderBy(item => item.CreatedDate)
.Where(item => item.Code > 3);
var result = query.Last();
That requires the whole collection to be sorted and then filtered. If we had a million items, only one of which had a code greater than 3, we'd be wasting a lot of time ordering results which would be thrown away.
Compare that with the reversed operation, filtering first:
var query = myCollection.Where(item => item.Code > 3)
.OrderBy(item => item.CreatedDate);
var result = query.Last();
This time we're only ordering the filtered results, which in the sample case of "just a single item matching the filter" will be a lot more efficient - both in time and space.
It also could make a difference in whether the query executes correctly or not. Consider:
var query = myCollection.Where(item => item.Code != 0)
.OrderBy(item => 10 / item.Code);
var result = query.Last();
That's fine - we know we'll never be dividing by 0. But if we perform the ordering before the filtering, the query will throw an exception.
Yes.
But exactly what that performance difference is depends on how the underlying expression tree is evaluated by the LINQ provider.
For instance, your query may well execute faster the second time (with the WHERE clause first) for LINQ-to-XML, but faster the first time for LINQ-to-SQL.
To find out precisely what the performance difference is, you'll most likely want to profile your application. As ever with such things, though, premature optimisation is not usually worth the effort -- you may well find issues other than LINQ performance are more important.
In your particular example it can make a difference to the performance.
First query: Your OrderBy
call needs to iterate through the entire source sequence, including those items where Code
is 3 or less. The Where
clause then also needs to iterate the entire ordered sequence.
Second query: The Where
call limits the sequence to only those items where Code
is greater than 3. The OrderBy
call then only needs to traverse the reduced sequence returned by the Where
call.
In Linq-To-Objects:
Sorting is rather slow and uses O(n)
memory. Where
on the other hand is relatively fast and uses constant memory. So doing Where
first will be faster, and for large collections significantly faster.
The reduced memory pressure can be significant too, since allocations on the large object heap(together with their collection) are relatively expensive in my experience.
Obviously the results would have to be identical still...
Note that this is not actually true - in particular, the following two lines will give different results (for most providers/datasets):
myCollection.OrderBy(o => o).Distinct();
myCollection.Distinct().OrderBy(o => o);
It's worth noting that you should be careful when considering how to optimize a LINQ query. For example, if you use the declarative version of LINQ to do the following:
public class Record
{
public string Name { get; set; }
public double Score1 { get; set; }
public double Score2 { get; set; }
}
var query = from record in Records
order by ((record.Score1 + record.Score2) / 2) descending
select new
{
Name = record.Name,
Average = ((record.Score1 + record.Score2) / 2)
};
If, for whatever reason, you decided to "optimize" the query by storing the average into a variable first, you wouldn't get the desired results:
// The following two queries actually takes up more space and are slower
var query = from record in Records
let average = ((record.Score1 + record.Score2) / 2)
order by average descending
select new
{
Name = record.Name,
Average = average
};
var query = from record in Records
let average = ((record.Score1 + record.Score2) / 2)
select new
{
Name = record.Name,
Average = average
}
order by average descending;
I know not many people use declarative LINQ for objects, but it is some good food for thought.
It depends on the relevancy. Suppose if you have very few items with Code=3, then the next order will work on small set of collection to get the order by date.
Whereas if you have many items with the same CreatedDate, then the next order will work on larger set of collection to get the order by date.
So, in both case there will be a difference in performance
精彩评论