Need help optimizing loop with Linq
Disclaimer: I have little experience with linq.
One of my tasks at my work is to maintain an e commerce web site. Yesterday, one of our customers started complaining of a timeout that would occur when they tried to create a feed file for goo开发者_StackOverflow中文版gle. Turns out, if the user has more than 9,000 items to put in their feed file, our code takes at least one minute to execute.
I couldn't find the source of the problem by running the debugger, so I fired up a profiler (ANTS) and let it do its thing. It found the source of our problem, a foreach loop that contains a bit of linq code. Here is the code:
var productMappings = GoogleProductMappingAccess.GetGoogleProductMappingsByID(context, productID);
List<google_Category> retCats = new List<google_Category>(numCategories);
int added = 0;
//this line was flagged by the profiler as taking 48.5% of total run time
foreach (google_ProductMapping pm in (from pm in productMappings orderby pm.MappingType descending select pm))
{
if (pm.GoogleCategoryId.HasValue && pm.GoogleCategoryId > 0)
{
//this line was flagged as 36% of the total time
retCats.Add(pm.google_Category);
}
else if (pm.GoogleCategoryMappingId.HasValue && pm.GoogleCategoryMappingId > 0)
{
retCats.Add(pm.google_CategoryMapping.google_Category);
}
else
{
continue;
}
if (++added >= numCategories)
{
break;
}
}
Do any of you more experienced devs have any ideas? I was toying with trying to replace all the linq with sql, but I am unsure if that is the best course of action here (if it was written with linq, there must be a reason for it).
If you can filter out the results you don't want anyway your query should be faster - you are using orderby
hence all these results use up processing in your query since they all have to be evaluated:
productMappings.Where( pm => (pm.GoogleCategoryMappingId.HasValue
&& pm.GoogleCategoryMappingId > 0)
||(pm.GoogleCategoryMappingId.HasValue &&
pm.GoogleCategoryMappingId > 0)
)
.OrderBy(...)
Also you should limit the number of results returned by the query since you only use up to numCategories
. So add a
.Take(numCategories)
to your query, instead of checking within the foreach loop.
The reason retCats.Add(pm.google_Category);
takes so long is because you are referecing a lazily loaded object which does another round trip to the server.
If you can refactor that so you only take a local copy of the Id instead of the whole object it will speed that part up.
If you do need to take the whole object, then investigate how you can pull it down in a single query when getting the productMappings. How to do this will depend on what LINQ wrapper you are using on your SQL.
Not knowing your database schema it's really hard to tell. A couple of ideas:
1) Run the query through the Database Engine Tuning Advisor. Maybe the query needs some indexes?
2) pre-processing this information and putting it in another table or file. That way when google requests it it won't timeout.
This should probably work:
var productMappings = GoogleProductMappingAccess.GetGoogleProductMappingsByID(context, productID);
var categories = from pm in productMappings
where pm.GoogleCategoryId > 0 ||
pm.GoogleCategoryMappingId > 0
orderby pm.MappingType descending
select pm.google_Category ??
pm.google_CategoryMapping.google_Category;
return categories.Take(numCategories);
It would work best if GetGoogleProductMappingsByID would return an IQueryable
(if applicable). If so, LINQ will convert the entire statement into a T-SQL command and that would be far faster than in memory LINQ.
Feel free to add a .ToList() to the last statement to get it into the same return type as in your code (and to force execution of the LINQ statement).
Checking for both .HasValue and > 0 is useless. Checking for Id > 0 is enough.
For more info: http://msdn.microsoft.com/en-us/library/2cf62fcy.aspx (operators)
精彩评论