Select items that are the top N results for a related table
Say I have a game where a question is asked, people post responses which are scored, and the top 10 responses win. I have a SQL database that stores all of this information, so I migh开发者_运维技巧t have tables such as Users, Questions, and Responses. The Responses table has foreign_keys user_id and question_id, and attribute total_score.
Obviously, for a particular Question I can retrieve the top 10 Responses with an order and limit:
SELECT * FROM Responses WHERE question_id=? ORDER BY total_score DESC LIMIT 10;
What I'm looking for is a way I can determine, for a particular User, a list of all their Responses that are winners (in the top 10 for their particular Question). It is simple programmatically to step through each Response and see if it is included in the top 10 for its Question, but I would like to optimize this so I am not doing N+1 queries where N is the number of Responses the User has submitted.
If you use Oracle, Microsoft SQL Server, DB2, or PostgreSQL, these databases support windowing functions. Join the user's responses to other responses to the same question. Then partition by question and order by score descending. Use the row number within each partition to restrict the set to those in the top 10. Also pass along the user_id of the given user so you can pick them out of the top 10, since you're only interested in the given user's responses.
SELECT *
FROM (
SELECT r1.user_id AS given_user, r2.*,
ROW_NUMBER() OVER (PARTITION BY r2.question_id ORDER BY r2.total_score DESC) AS rownum
FROM Responses r1 JOIN Responses r2 ON r1.question_id = r2.question_id
WHERE r1.user_id = ?
) t
WHERE rownum <= 10 AND user_id = given_user;
However, if you use MySQL or SQLite or other databases that don't support windowing functions, you can use this different solution:
Query for the user's responses, and use a join to match other responses to the respective questions with greater score (or earlier PK in the case of ties). Group by question, and count the number of responses that have higher score. If the count is fewer than 10, then the user's response is among the top 10 per question.
SELECT r1.*
FROM Responses r1
LEFT OUTER JOIN Responses r2 ON r1.question_id = r2.question_id
AND (r1.total_score < r2.total_score
OR r1.total_score = r2.total_score AND r1.response_id > r2.response_id)
WHERE r1.user_id = ?
GROUP BY r1.question_id
HAVING COUNT(*) < 10;
Try an embedded select statement. I don't have access to a DB tool today so I can't confirm the syntax/output. Just make the appropriate changes to capture all the columns you need. You can also add questions to the main query and join off of responses.
select *
from users
, responses
where users.user_id=responses.user_id
and responses.response_id in (SELECT z.response_id
FROM Responses z
WHERE z.user_id = users.user_id
ORDER BY total_score DESC
LIMIT 10)
Or you can really optimize it by adding another field like "IsTopPost". You would have to update the top posts when someone votes, but your query would be simple:
SELECT * FROM Responses WHERE user_id=? and IsTopPost = 1
I think something like this should do the trick:
SELECT
user_id, question_id, response_id
FROM
Responses AS r1
WHERE
user_id = ?
AND
response_id IN (SELECT response_id
FROM Responses AS r2
WHERE r2.question_id = r1.question_id
ORDER BY total_score DESC LIMIT 10)
Effectively, for each question_id, a subquery is performed which determines the top 10 responses for that question_id.
You may want to consider adding a column which marks certain Responses as 'winners'. That way, you can simply select those rows and save the database from having to calculate the top 10's over and over again.
精彩评论