How to optimize this linq query?
I have the following linq query:
var allnews = from a in db.News
where !(from c in db.NewsViews
where c.User开发者_Go百科Guid == thisUser.UserGuid
select c.NewsGuid).Contains(a.NewsGuid)
orderby a.Date descending
select a;
I'm wondering what is the best way to optimize it? Or would the query profiler do that for me?
Edit: The idea is to get all the news items that the user has not seen yet. So once a user has seen an item, I stored that item in NewsViews. And the news themselves are in News.
The sub-query seems not to use a
, so
//untested
var allnews = from a in db.News
let excluders = from c in db.NewsViews
where c.UserGuid == thisUser.UserGuid
select c.NewsGuid
where !excluders.Contains(a.NewsGuid)
orderby a.Date descending
select a;
But be advised, you are now doing SQL optimization through LINQ (btw, is this L2S or EF ?).
And normal SQL optimization is difficult enough. You'll have to measure and analyze with realistic data. It is quite possible that @Joachim's approach with multi inner-join subqueries is better.
instead of using contains, you could add to the where-statement of your inner query:
... and c.newsguid == a.newsguid
and .Any() of your inner query
var allnews = from a in db.News
where !(from c in db.NewsViews
where c.UserGuid == thisUser.UserGuid
and c.NewsGuid == a.NewsGuid).Any()
orderby a.Date descending
select a;
I'm presuming that the goal is to retrieve the NewsViews in descending date order:
db.News.OrderByDescending(a => a.Date).NewsViews;
This, of course, assumes that you have already set up an association in your model between the News and NewsViews entities. By setting up the association ahead of time, the subquery becomes unnecessary.
UPDATE:
I've been using LINQ-to-SQL for about 18 months, and I've been using the same construct as the one you illustrated for my NOT IN
queries. As I stated earlier, you may get a little performance bump if you set up your association in the model ahead of time and use indexes in the database itself, but from a LINQ point of view, I believe you're as optimized as you're going to get without resorting to an unnecessarily cryptic query statement.
Here's an alternative formulation:
from newsitem in db.News
join viewing in (
from viewing in db.NewsViews
where viewing.UserGuid == thisUser.UserGuid
select viewing
) on newsitem.NewsGuid equals viewing.NewsGuid into usersviewings
where !usersviewings.Any()
orderby newsitem.Date descending
select newsitem;
But as to whether this is any faster - well that's anyone's guess; try it. Fundamentally, you're doing a left join with the left part is filtered and must not return any results - that doesn't index well, AFAIK. The execution engine will need to scan all rows in the news set, and if you're backed by SQL, then table-scans are not your friend. Having said that, unless you actually expect this to be a huge table, it may not matter much, particularly if you only report the top N hits...
Maybe this is my lack to linq knowledge but perhaps a left join where a column in NewsViews is null? That seems like it would be better than making a subquery and comparing the two.
The best optimization move you could make here, would be to allow for a navigation from NewsViews to News... Since one doesn't exist, I had to get a little bit hacky with the optimization.
db.News.Join(db.News.Select(n => n.NewsGuid)
.Except(db.NewsViews
.Where(c => c.UserGuid == thisUser.UserGuid)
.Select(c => c.NewsGuid)
), n1 => n1.NewsGuid, n2 => n2, (n1, n2) => new { n1 = n1, n2 = n2 })
.Select(anon => anon.n1);
An Except is going to produce the best performing SQL when you are attempting to do a query where a list does not contain another list. Since no navigation from NewsView to News though, we have to cheat with an Inner Join to return News.
Another way that this could be done would be my pal the GroupJoin.
db.News
.GroupJoin(db.NewsViews, n => n.NewsGuid, nv => nv.NewsGuid, (n, nv) => new { News = n, NewsViewList = nv })
.Where(anon => anon.NewsViewList != null) // I don't remember the best test here, either it's not null, or the count > 0 :-)
.OrderByDescending(anon => anon.News.Date)
.Select(anon => anon.News);
This is how I would have done it at least.
精彩评论