T-SQL - Help with MAX Operation Over Many-to-Many
This is closely related to a previous question i asked.
I have a many-to-many relationship between Post and Location.
The join table is called PostLocations, and has nothing but the FK's. (LocationId, PostId)
I'm trying to pull back the top post for each location.
This is the query i have (which was given in the answer to my previous question):
SELECT pl.LocationId, p.postid, p.UniqueUri, p.Content, MAX(s.BaseScore) as topscore
from dbo.PostLocations pl
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by pl.locationid, p.postid, p.UniqueUri, p.Content
But the problem is, because PostLocations could have entries like this:
LocationId PostId
1 213213
2 498324
1 230943
So my above query is returning LocationId 1 twice, because it has two records in the join table. I only want 1 record per location - the top post per locationid.
I've also tried this:
SELECT l.LocationId, p.postid, p.UniqueUri, p.Content, MAX(s.BaseScore) as topscore
from dbo.PostLocations 开发者_JAVA技巧pl
inner join dbo.Locations l on pl.LocationId = l.LocationId
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by l.locationid, p.postid, p.UniqueUri, p.Content
Same result - this is what comes back:
LocationId PostId UniqueUri Content TopScore
1 213213 some-post pew pew 2.00
2 498324 anot-post blah bl 4.50
1 230943 sadjsa-as asijd a 3.5
This is what should come back:
LocationId PostId UniqueUri Content TopScore
1 230943 sadjsa-as asijd a 3.5
2 498324 anot-post blah bl 4.50
Because LocationId 1 has 2 posts, but PostId 230943 has the highest score so that is the one returned.
Any ideas on what i'm missing?
If you are using SQL Server 2005 or later, you can do something like:
With RankedLocations As
(
Select PL.LocationId
, S.BaseScore
, P.PostID
, P.UniqueUri
, P.Content
, Row_Number() Over( Partition By PL.LocationId Order By S.BaseScore Desc ) As ScoreRank
From dbo.PostLocations As PL
Join dbo.Posts As P
On P.PostId = PL.PostId
Join dbo.Reviews As R
On R.PostId = P.PostId
Join dbo.Scores As S
On S.ScoreId = R.ScoreId
)
Select LocationId, BaseScore, PostID, UniqueUri, Content
From RankedLocations
Where ScoreRank = 1
As you're grouping on postID which is unique every single post falls into its own group of one.
I'm not sure if there is a better way to do this however what I've done in the past runs along these lines
Select l.LocationId, p.postid, p.UniqueUri, p.Content, s.basescore as topscore
from
dbo.Locations l inner join
(select
pl.locationid, max(s.BaseScore) as topscore
from
dbo.postlocations pl
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by
pl.locationid) as topPost on l.locationid = topPost.locationid
inner join dbo.postlocations pl on pl.locationid = l.locationid
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid and s.basescore = toppost.topscore
We make a subquery to find the top score for a given location then do our joins as before and on the last join ensure that the basescore is the topscore we found earlier.
This means that if we have two equal top scores for a given location we will return both rows, however in all other cases we will return a single row per location, it can be modified to pick an arbitrary post given two equal top scores but I haven't done so.
I'm interested to see if there are any other solutions to this problem as with the number of extra joins this is quite a computationally expensive solution to the problem.
Edit - in response to your comment as postid is the primary key we can rely on it to be greatest for the latest post.
Select l.LocationId, p.postid, p.UniqueUri, p.Content, bar.basescore as topscore
from
dbo.Locations l inner join
(select
l.LocationId, max(p.postid) as postid ,max(s.basescore) as basescore
from
(select
pl.locationid, max(s.BaseScore) as topscore
from
dbo.postlocations pl
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by
pl.locationid) as topPost on l.locationid = topPost.locationid
inner join dbo.postlocations pl on pl.locationid = l.locationid
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid and s.basescore = toppost.topscore
group by l.locationid) as bar on l.locationid = bar.locationid
inner join posts p on bar.postid = p.postid
精彩评论