Performance concern when using LINQ "everywhere"?
After upgrading to ReSharper5 it gives me even more useful tips on code improvements. One I see everywhere now is a tip to replace foreach-statements with LINQ queries. Take this example:
private Ninja FindNinjaById(int ninjaId)
{
foreach (var ninja in Ninjas)
{
if (ninja.Id == ninjaId)
return ninja;
}
return null;
}
This is suggested replaced with the followi开发者_StackOverflowng using LINQ:
private Ninja FindNinjaById(int ninjaId)
{
return Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId);
}
This looks all fine, and I'm sure it's no problem regarding performance to replace this one foreach. But is it something I should do in general? Or might I run into performance problems with all these LINQ queries everywhere?
You need to understand what the LINQ query is going to do "under the hood" and compare that to running your code before you can know whether you should change it. Generally, I don't mean that you need to know the exact code that will be generated, but you do need to know the basic idea of how it would go about performing the operation. In your example, I would surmise that LINQ would basically work about the same as your code and because the LINQ statement is more compact and descriptive, I would prefer it. There are times, though, when LINQ may not be the ideal choice, though probably not many. Generally I would think that just about any looping construct would be replaceable by an equivalent LINQ construct.
Let me start by saying that I love LINQ for its expressiveness and use it all the time without any problem.
There are however some differences in performance. Normally they are small enough to ignore, but in the critical path of your application, there might be times you want to optimize them away.
Here is the set of differences that you should be aware of, that could matter with performance:
- LINQ uses delegate calls excessively, and delegate invocations are (a very tiny bit) slower than method invocations and of course slower than inline code.
- A delegate is a method pointer inside an object. That object need to be created.
- LINQ operators usually return a new object (an iterator) that allows looping through the collection. Chained LINQ operators thus create multiple new objects.
- When your inner loop uses objects from outside (called closures) they have to be wrapped in objects as well (which need to be created).
- Many LINQ operators call the
GetEnumerator
method on an collection to iterate it. CallingGetEnumerator
usually ensures the creation of yet another object. - Iterating the collection is done using the
IEnumerator
interface. Interface calls are a bit slower than normal method calls. IEnumerator
objects often need to be disposed or at least,Dispose
has to be called.
When performance is a concern, also try using for
over foreach
.
Again, I love LINQ and I can't remember ever decided not to use a LINQ (to objects) query because of performance. So, don't do any premature optimizations. Start with the most readability solution first, than optimize when needed. So profile, profile and profile.
One thing we identified to be performance problematic is creating lots of lambdas and iterating over small collections. What happens in the converted sample?
Ninjas.FirstOrDefault(ninja => ninja.Id == ninjaId)
First, new instance of (generated) closure type is created. New instance in managed heap, some work for GC. Second, new delegate instance is created from method in that closure. Then method FirstOrDefault is called. What it does? It iterates collection (same as your original code) and calls delegate.
So basically, you have 4 things added here: 1. Create closure 2. Create delegate 3. Call through delegate 4. Collect closure and delegate
If you call FindNinjaById lots of times, you will add this to may be important perforamnce hit. Of course, measure it.
If you replace it with (equivalent)
Ninjas.Where(ninja => ninja.Id == ninjaId).FirstOrDefault()
it adds 5. Creating state machine for iterator ("Where" is yielding function)
Profile
The only way to know for sure is to profile. Yes, certain queries can be slower. But when you look at what ReSharper has replaced here, it's essentially the same thing, done in a different manner. The ninjas are looped, each Id is checked. If anything, you could argue this refactoring comes down to readability. Which of the two do you find easier to read?
Larger data sets will have a bigger impact sure, but as I've said, profile. It's the only way to be sure if such enhancements have a negative effect.
We've built massive apps, with LINQ sprinkled liberally throughout. It's never, ever slowed us down.
It's perfectly possible to write LINQ queries that will be very slow, but it's easier to fix simple LINQ statements than enormous for/if/for/return algorithms.
Take resharper's advice :)
An anecdote: when I was just getting to know C# 3.0 and LINQ, I was still in my "when you have a hammer, everything looks like a nail" phase. As a school assignment, I was supposed to write a connect four/four in row game as an exercise in adversarial search algorithms. I used LINQ throughout the program. In one particular case, I needed to find the row a game-piece would land on if I dropped it in a particular column. Perfect use-case for a LINQ query! This turned out to be really slow. However, LINQ wasn't the problem, the problem was that I was searching to begin with. I optimized this by just keeping a look-up table: an integer array containing the row number for every column of the game-board, updating that table when inserting a game-piece. Needless to say, this was much, much faster.
Lesson learned: optimize your algorithm first, and high level constructs like LINQ might actually make that easier.
That said, there is a definite cost to creating all those delegates. On the other hand, there can also be a performance benefit by utilizing LINQ's lazy nature. If you manually loop over a collection, you're pretty much forced to create intermediate List<>
's whereas with LINQ, you basically stream the results.
The above does the exact same thing.
As long as you use your LINQ queries correctly you will not suffer from performance issues. If you use it correctly it is more likely to be faster due to the skill of the people creating LINQ.
The only thing you can benefit of creating your own is if you want full control or LINQ does not offer what you need or you want a better ability to debug.
The cool thing about LINQ queries is that it makes it dead simple to convert to a parallel query. Depending on what you're doing, it may or may not be faster (as always, profile), but it's pretty neat, nonetheless.
To add my own experience of using LINQ where performance really does matter - with Monotouch - the difference there is still insignificant.
You're 'handicapped' on the 3GS iPhone to around 46mb of ram and a 620mhz ARM processor. Admittedly the code is AOT compiled but even on the simulator where it is JIT'd and going through a long series of indirection the difference is tenths of a millisecond for sets of 1000s of objects.
Along with Windows Mobile this is where you have to worry about the performance costs - not in huge ASP.NET applications that are running on quad-core 8gb servers, or desktops with dual scores. One exception to this would be with large object sets, although arguably you would lazy load anyway, and the initial query task would be performed on the database server.
It's a bit of a cliché on Stackoverflow, but use the shorter more readable code until 100s of milliseconds really do matter.
精彩评论