How to optimize this LINQ query for Visual Studio?
I have this one gigantic complex LINQ to SQL query that I need to optimize somehow, because the background C# compiler completely hogs the CPU and I can't type or edit my .cs
file normally in Visual Studio 2010 (every letter, especially if IntelliSense wants to pop up, lags horribly).
The culprit is this:
var custFVC =
(from cfvc in customer.CustomerFrameVariationCategories
let lastValue = cfvc.CustomerFrameVariationCategoryValueChanges.Where(cfvcvc => cfvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(cfvcvc2 => cfvcvc2.ChangeDateTime).FirstOrDefault() ?? new CustomerFrameVariationCategoryValueChange()
let lastValue2 = cfvc.FrameVariationCategory.FrameVariation.Frame.FrameValueChanges.Where(fvc => fvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvc2 => fvc2.ChangeDateTime).FirstOrDefault() ?? new FrameValueChange()
let lastValue3 = cfvc.FrameVariationCategory.FrameVariationCategoryValueChanges.Where(fvcvc => fvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvcvc2 => fvcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameVariationCategoryValueChange()
let lastValue4 = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Any(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).IsActive == false)
where lastValue.IsActive == true
orderby cfvc.FrameVariationCategory.FrameVariation.Frame.Name, cfvc.FrameVariationCategory.Category.Name, cfvc.FrameVariationCategory.FrameVariation.Name
select new
{
cfvc.Id,
cfvc.FrameVariationCategory,
lastValue.CoverCoefficient,
lastValue.NeiserNet,
PlywoodName = lastValue2.Plywood.Name,
FrameIsActive = lastValue2.IsActive,
OwnCost = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(lastValue4 ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= this.SelectedDateTim开发者_开发百科e) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange()))),
Cubes = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime && fmvc.IsActive == true).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).Cubes),
lastValue3.CoverNet,
lastValue3.CoverGarbage,
lastValue3.CoverGross,
lastValue3.CoverPrice,
lastValue3.BackgroundNet,
lastValue3.BackgroundGarbage,
lastValue3.BackgroundGross,
lastValue3.BackgroundPrice,
FVCIsActive = lastValue3.IsActive,
FrameModuleAnyNonActive = lastValue4
}).ToList();
The biggest problem here is OwnCost
, everything up to and after that Visual Studio can handle. I don't want to turn off background compiling (a feature that checks for compile time errors before actually compiling), I don't want to create a stored procedure. I can't off this code into a separate class/method, because the LINQ DataContext can't be passed around (as far as I know - also take into consideration that the context variable is inside a using
statement).
The only vague idea that I have, is some sort of an extension method, or a method that returns a LINQ query or something like that. Because I don't know what exactly it is that I can do here to rectify the problem, I don't know how to formulate the wording, thus I can't google it...
How can I move (or optimize) OwnCost
or the entire query out of the current .cs
file, or perhaps split it into a method within the same file (might help the background compiler), or "something"...?
My first instinct as that you're trying to make LINQ to SQL do the work of a stored procedure. But that may be incorrect; it's pretty difficult to tell if it would even be possible for a stored procedure to do this.
My second instinct is that it should be possible to split the OwnCost
calculation into a function, so that this query just contains
OwnCost = cfvc.Select(CalculateOwnCost)
My third instinct, on seeing that the calculation includes a WindowExcel
object, is to flee screaming, but I'm going to take a couple of deep breaths and ask, are you in fact interoperating with Excel in the context of this query, and might that possibly be a source of problems?
Edit
To break out the OwnCost
calculation into its own function, do something like this:
public decimal CalculateOwnCost(CustomerFrameVariationCategory cvfc)
{
return cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(lastValue4 ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= this.SelectedDateTime) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= this.SelectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange()))),
Cubes = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime && fmvc.IsActive == true).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).Cubes)
}
That assumes that CustomerFrameVariationCategories
is a colleciton of CustomerFrameVariationCategory
objects, and that OwnCost
is a decimal
.
Once you do this, your original query can just include the Select
that I showed above - you can also write it as
OwnCost = cfvc.Select(x => CalculateOwnCost(x))
if it makes you more comfortable (me, I've gotten scolded by Resharper enough on this point that I've come to accept it, but it's a matter of taste).
There's no reason you can't further decompose some of the intermediate expressions in that query into their own functions. A lambda function is just a function, after all.
I don't have any inherent insight into this problem in terms of what is expensive in the C# compiler. However the two things that jump out when I look at your query are the following
- The number and complexity of the
let
bindings - The complexity of the initializer of the
OwnCost
member inside the select clause
The best advice I can give is to try and break up the query in order to get these into separate statements and hopefully that will ease the pressure on the compiler.
Split it up. That is one massive expression tree VS is trying to deal with. You could break it up so that some of the SELECT clause transformation happens in LINQ-to-object. This would be a lot easier for the background compiler to deal with. Just get:
var custFVC = (from cfvc in customer.CustomerFrameVariationCategories
let lastValue = cfvc.CustomerFrameVariationCategoryValueChanges.Where(cfvcvc => cfvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(cfvcvc2 => cfvcvc2.ChangeDateTime).FirstOrDefault() ?? new CustomerFrameVariationCategoryValueChange()
let lastValue2 = cfvc.FrameVariationCategory.FrameVariation.Frame.FrameValueChanges.Where(fvc => fvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvc2 => fvc2.ChangeDateTime).FirstOrDefault() ?? new FrameValueChange()
let lastValue3 = cfvc.FrameVariationCategory.FrameVariationCategoryValueChanges.Where(fvcvc => fvcvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fvcvc2 => fvcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameVariationCategoryValueChange()
let lastValue4 = cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Any(fvm => (fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= this.SelectedDateTime).OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleValueChange()).IsActive == false)
where lastValue.IsActive == true
orderby cfvc.FrameVariationCategory.FrameVariation.Frame.Name, cfvc.FrameVariationCategory.Category.Name, cfvc.FrameVariationCategory.FrameVariation.Name
select new
{ cfvc, lastValue, lastValue1, lastValue2, lastValue3}).ToList();
And then do the rest of your manipulation from there. If the result set is small, this might be more efficient anyway, and certainly easier on your db. If the result set is small, doing this will have very little performance cost.
If you have a bored, underworked db and there's a large result set and the machine where this code is running is strained, then you might need to keep the workload on the db.
Keep in mind that just breaking up into several steps the building of one massive expression tree to be run against IQuerable will not do you any good. That last variable will be as complicated (under the hood) as yours is now, and the compiler will still choke. The bottom line is you need to run .ToList() earlier in the life of this manipulation. A series of LINQ-to-object queries against IEnumerable won't be difficult for the background compiler to handle.
VS is probably choking on this because it's such a large, complex single statement.
Since the only linkages between the OwnCost and the LINQ context are the reference to cfvc and lastValue4, it seems to me that you could calculate OwnCost in a separate step after the initial LINQ query statement. Store the lastValue4 in the anonymous type constructed by the LINQ statement, remove OwnCost, and remove .ToList() from the end. You don't need to store the cfvc value since the only thing you're using it for is to access .FrameVariationCategory, which you've already captured in the 2nd field of the anonymous type.
In a separate statement, select from the CustFVC resultset to construct the OwnCost for each item to produce a new result set that contains all the data bits you're looking for. Call .ToList() on that second result set. This should produce equivalent results to the monster statement in similar time.
If this is a large result set, be careful about iterating over the data multiple times. If you use a foreach to calculate OwnCost for each item in the original result set, you will be running through the data twice - twice as much work as the single monster LINQ query.
If you use a LINQ query for the second operation, it shouldn't cause any additional passes over the data beyond what you already have - LINQ is lazy evaluated, so the next row is not actually retrieved until asked for. ToList() forces all rows to be retrieved. A foreach loop forces all rows to be retrieved. LINQ query using a LINQ query as an input does not iterate any rows of the input result set, it just piles on more conditions to be evaluated when somebody eventually asks for the next row of the 2nd result set.
I may be wrong here and it's just a guess, but have you tried splitting your class (and essentially your file) with the partial keyword?
Hah, I found a solution myself :)
Roberts instinct about a LINQ function got me googling. The results were not relevant to the matter at hand, but the little code I did stumble upon, got me thinking about a brute force attack method. Using redoced's idea of a partial class I finally wrote this piece of code in a seperate .cs file:
public partial class WindowExcel
{
private static decimal GetOwnCost(CustomerFrameVariationCategory cfvc, bool frameModuleAnyNonActive, DateTime selectedDateTime)
{
return cfvc.FrameVariationCategory.FrameVariation.FrameVariationModules.Sum(fvm => // sum all frame variation modules
(frameModuleAnyNonActive ? 0 : fvm.FrameModule.FrameModuleValueChanges.Where(fmvc => fmvc.ChangeDateTime <= selectedDateTime) // if module not active then 0
.OrderByDescending(fmvc2 => fmvc2.ChangeDateTime).FirstOrDefault().Porolone) + // otherwise get Porolone
fvm.FrameModule.FrameModuleComponents.Sum(fmc => // add to Porolone sum of all module components
(fmc.Article.ArticleDetails.Any() ? fmc.Article.ArticleDetails.Sum(ad => // if any article details then use A*L*W*T instead of Amount
WindowExcel.MultiplyArticleDetailValues(ad.ArticleDetailValueChanges.Where(advc => advc.ChangeDateTime <= selectedDateTime)
.OrderByDescending(advc2 => advc2.ChangeDateTime).FirstOrDefault() ?? new ArticleDetailValueChange())) :
WindowExcel.GetModuleComponentAmount(fmc.FrameModuleComponentValueChanges.Where(fmcvc => fmcvc.ChangeDateTime <= selectedDateTime) // no details = get amount
.OrderByDescending(fmcvc2 => fmcvc2.ChangeDateTime).FirstOrDefault() ?? new FrameModuleComponentValueChange())) * // times article values
WindowExcel.MultiplyArticleValues(fmc.Article.ArticleValueChanges.Where(avc => avc.ChangeDateTime <= selectedDateTime)
.OrderByDescending(avc2 => avc2.ChangeDateTime).FirstOrDefault() ?? new ArticleValueChange())));
}
}
And in my gigantic LINQ query I rewrote OwnCost as such:
OwnCost = WindowExcel.GetOwnCost(cfvc, lastValue4, this.SelectedDateTime)
Editing the GetOwnCost method is still painfully slow, as was excepted, but at least the rest of my project is now usable. I'm not sure what this brute force seperation does to performance. The fact that I can't ref the CustomerFrameVariationCategory and that the OwnCost expression tree is inside of a method not in a LINQ query itself, raises questions. Guess I'll have to profile it at some point, but that's a hole other issue.
Now to the delicate issue of what to mark as the answer. Though I do appreciate all the input, none of the answers so far were correct (no concrete solution), thus I'll have to mark my own post as the answer. But I will vote for redoced's and Robert's answers for pointing me in the right direction.
I would appreciate, if anyone can comment about possible code execution performance impacts for my solution vs the original code.
PS! Writing this in Internet Explorer 8 is again painfully slow because of the constant CPU hogging (has something to do with coloring the code). So it's not only a VS issue....
Edit:
It seems Robert has managed to post the exact same solution I came up with. Would have probably got my answer posted earlier if not for the constant CPU hogging...
In all fairness I marked Robert's post as the answer :)
Instead of writing LINQ to SQL you can write a stored procedure for this to do all these things.
精彩评论