Mixing different categories results, ordered by score in MySQL
In my PHP application, I have a mysql table of articles which has the following columns:
article_id articletext category_id score
Each article has a score which is calculated based on how popular it is, and belongs to a specific category (there are around 10 categories available)
My question is: how can I perform a query that returns the highest scored articles while alternating them by categories so that if possible, no same-category articles are returned consecutively. For example if the highest scored article has score: 100 开发者_开发问答the returning set would be something like:
article_id articletext category_id score
-----------------------------------------------------
142 <.....> 5 100
153 <.....> 3 97
119 <.....> 5 99
169 <.....> 2 93
121 <.....> 7 89
197 <.....> 2 92
.
.
.
The first (naive) solution that comes in mind is performing 10 selects (1 for each category), ordering them by score descending, and then, in PHP level, alternating between each returned dataset, picking one result at a time and combining them together in a new array.
Is there any more efficient way to achieve this? If possible at the MySQL level
Go get the top 20. If they don't satisfy the requirements, do an additional query to get the missing pieces. You should be able to come up with some balance between number of queries and number of rows each returns.
I you got the top 100 it might satisfy the requirements 90% of the time and would be cheaper and faster than 10 separate queries.
If it was SQL Server I could help more...
Actually, I have another idea. Run a process every 5 minutes that calculates the list and caches it in a table. Make DML against related tables invalidate the cache so it is not used until repopulated (perhaps an article was deleted). If the cache is invalid, you would fall back to calculating it on the fly... And could use that to repopulate the cache anyway.
It might be possible to strategically update the cached list rather than recalculate it. But that could be a real challenge.
This should help both with query speed and reducing load on your database. It shouldn't matter much if your article list is 5 minutes out of date. Heck, even 1 minute might work.
Just for learning purpose. I made a test with 3 categories. I have no idea how this query could run on a large recordset.
select * from (
(select @r:=@r+1 as rownum,article_id,articletext,category_id,score
from articles,(select @r:=0) as r
where category_id = 1
order by score desc limit 100000000)
union all
(select @r1:=@r1+1,article_id,articletext,category_id,score
from articles,(select @r1:=0) as r
where category_id = 2
order by score desc limit 100000000)
union all
(select @r2:=@r2+1,article_id,articletext,category_id,score
from articles,(select @r2:=0) as r
where category_id = 3
order by score desc limit 100000000)
) as t
order by rownum,score desc
Your naive solution is exactly what I would do.
精彩评论