Single SQL query that returns list of "most commented" relative to past 7 days, but the list always has to contain something
What first seemed easy has now become a bit of a brain teaser.
I'm building one of those "Most Commented" or "Most Emailed" lists that you see in the right sidebar of news/blog websites.
The list has to be r开发者_StackOverflowelative to a recent date range (say the last 7 days) because you want the list to always to look fresh and contain newer content. Otherwise, the few articles that got tons of comments will always sit atop the list.
For instance, in my case, the website has a few articles that have several hundred comments each but all the rest have less than 20. So if the "Most Commented" list was pulled like this:
select
ArticleId,
count(CommentId) as Comments
from
Comment
group by
ArticleId
order by
count(CommentId) desc
limit 10;
Then the few articles with several hundred comments would always be at the top. The list would never change.
So I rewrote the select to include just the last 7 days:
select
ArticleId,
count(CommentId) as Comments
from
Comment
where
Created >= '2011-06-14'
group by
ArticleId
order by
count(CommentId) desc
limit 10;
That looks better, but it still won't work because it doesn't account for situations in which there is no activity in the last 7 days.
If nothing happened (or only a few comments happened) in the last 7 days, the list should contain activity from before that. Bottom line - the list always needs to show 10 articles.
I could always call multiple SQL statements, going back in 7 days increments until I fill the list, but I don't want to do that. I'm caching the results in in-process memory, but I still want to only make one SQL call if at all possible.
Thoughts? I feel like this is easily doable and I'm just overlooking the obvious somewhere.
I recommend computing a weight for each result and sort by that. For example, your weight could be the number of actions (comments, emails, etc) / age in days. That way, the older the article, the lower its weight unless it has an extremely high action rate.
Something like:
select
ArticleId,
count(CommentId) / (1.0 * DATEDIFF(CURRENT_DATE, Created)) AS weight
from
Comment
group by
ArticleId
order by
weight desc
limit 10;
You'll want to play with your weight formula until you get the right mix of newer content with slightly older but active content.
Here is another approach that answers your question as written (my previous answer suggests a different approach).
This one will return the top for this week and more from previous weeks if needed to fill the top 10. It works by computing a column that is the number of weeks old the article is and sorts by that and then the number of comments.
The advantage here is that will will always return 10 results (assuming you have more than 10 articles) and will return them sorted by age and comments with the most recent weeks articles at the top and the articles that are one week old next, etc.
select
ArticleId,
count(CommentId) as number_of_comments,
DATEDIFF(CURRENT_DATE, Created) DIV 7 AS weeks_old
from
Comment
group by
ArticleId,
weeks_old
order by
weeks_old asc,
number_of_comments desc
limit 10;
I assume you also want to limit to a number of records. Try something like the following: (NOTE: I use MSSQL, so some details/syntax might be incorrect for mysql...sorry!)
select top 10
ArticleId,
case when Created>='2011-06-14' then 1 else 0 end as [isNew],
count(CommentId) as Comments
from
Comment
group by
ArticleId,
case when Created>='2011-06-14' then 1 else 0 end
order by
isNew desc,
Comments desc
Assuming that you want to keep filling it in by 7 day increments:
SELECT
C.article_id,
COUNT(C.comment_id)
FROM
Comments C
INNER JOIN Numbers N ON
N.number >= 0 AND
N.number <=4 AND
C.created <= CURRENT_DATE - INTERVAL (N.number * 7) DAY AND
C.created >= CURRENT_DATE - INTERVAL ((N.number + 1) * 7) DAY
GROUP BY
C.article_id
ORDER BY
N.number
COUNT(C.comment_id)
LIMIT 10
I don't work with MySQL, so you may need to play with the syntax, especially with the date functions.
Also, this uses a Numbers table, which is just a table of numeric integers from 1 to whatever. Maybe MySQL has a built-in function for this, otherwise you would need to create or generate the table. It only goes back up to four weeks in this case, so you could even simply use a SELECT 1 UNION ALL SELECT 2...
Finally, since it only goes back four weeks, you could still end up with less than 10 articles. You can increase the number of weeks to something that seems reasonable in your business case.
I may be missing something but couldn't you Union what you already have?
I don't know the mySQL syntax so you will have to convert.
select top 10 *
from
(
select ArticleId, count(CommentId) as Comments
from Comment
where Created >= '2011-06-14'
group by ArticleId
union
select ArticleId, count(CommentId) as Comments
from Comment
group by ArticleId
)
order by comments desc
This will give you the recently commented articles sorted based on the total comments received by an article.
SELECT ArticleId, COUNT(CommentId) CommentsCount FROM Comment WHERE ArticleID IN
(
select DISTINCT ArticleId FROM Comment ORDER BY Created limit 10
) ORDER BY CommentsCount DESC
Hope that is closer to what you want. If you need the accurate query i.e. include only the last weeks comment in the CommentsCount field, please let me know :-)
精彩评论