How would I implement a ranking algorithm in my website to sort database data?
I want to implement a ranking system on a website I've been working on and have decided to go with the Hacker News algorithm. My reasoning for choosing this algorithm is simply because it's been described here.
I was looking at this Python code (the language I'm using to build my site) and couldn't figure out how I would implement it.
def calculate_score(votes, item_hour_age, gravity=1.8):
return (votes - 1) / pow((item_hour_age+2), gravity)
Given the tables:
posts:
id | title | time_submitted
votes:
id | postid | userid | score
How would I pull the data from the database? The ideal solution (most efficient) would be to construct a MySQL query to retrieve the top 10 posts ranked using the algorithm. But given that Hacker News has it implemented in Arc, it makes me think they pull out all the posts then run them through the algorithm to rank them.
Reddit also comes to mind for this... They use a n开发者_如何学运维on-relational database schema so I would assume they, like Hacker News, perform the rankings in their code - not the database.
How would you implement this?
EDIT: one post can have many votes as I would like to log which user votes on which post.
You can use the data you need in the ORDER BY
clause.
SELECT p.id, p.title, p.time_submitted, SUM(v.score) as num_votes
FROM posts p, votes v
WHERE v.postid = p.id
GROUP BY p.id
ORDER BY
(SUM(v.score) - 1) / POW(TIMESTAMPDIFF(HOUR,p.time_submitted,NOW()) + INTERVAL 2 HOUR, 1.8) DESC
LIMIT 100
In your case, the number of votes would be returned by:
SELECT count(*) FROM votes WHERE postid=<THE POST'S ID>;
If you want to consider score, you could include that in the query but the formula you provided is not equipped to handle it.
The item hour age is simply the current time subtracted from the time submitted:
SELECT HOUR(TIMEDIFF(NOW(), time_submitted)) FROM posts WHERE id=<THE POST'S ID>;
This can also be done entirely in SQL:
SELECT id FROM posts ORDER BY (((SELECT count(*) FROM votes WHERE postid=posts.id) - 1) / MOD(HOUR(TIMEDIFF(NOW(), time_submitted) + INTERVAL 2 HOURS), <GRAVITY>)) LIMIT 10;
精彩评论