Making more recent items more likely to be drawn

2023-01-01 18:15 问答作者：

There are a few hundred of book records in the datab开发者_JS百科ase and each record has a publish time. In the homepage of the website, I am required to write some codes to randomly pick 10 books and put them there. The requirement is that newer books need to have higher chances of getting displayed.

Since the time is an integer, I am thinking like this to calculate the probability for each book:

Probability of a book to be drawn = (current time - publish time of the book) / ((current time - publish time of the book1) +  (current time - publish time of the book1) + ... (current time - publish time of the bookn))

After a book is drawn, the next round of the loop will minus the (current time - publish time of the book) from the denominator and recalculate the probability for each of the remaining books, the loop continues until 10 books have been drawn.

Is this algorithm a correct one?

By the way, the website is written in PHP.

Feel free to suggest some PHP codes if you have a better algorithm in your mind.

Many thanks to you all.

Here's a very similar question that may help: Random weighted choice The solution is in C# but the code is very readable and close to PHP syntax so it should be easy to adapt.

For example, here's how one could do this in MySQL:

First calculate the total age of all books and store it in a MySQL user variable:

SELECT SUM(TO_DAYS(CURDATE())-TO_DAYS(publish_date)) FROM books INTO @total;

Then choose books randomly, weighted by their age:

SELECT book_id FROM (
  SELECT book_id, TO_DAYS(CURDATE())-TO_DAYS(publish_date) AS age FROM books
) b
WHERE book_id NOT IN (...list of book_ids chosen so far...)
  AND RAND()*@total < b.age AND (@total:=@total-b.age)
ORDER BY b.publish_date DESC
LIMIT 10;

Note that the @total decreases only if a book has passed the random-selection test, because of short-circuiting of AND expressions.

This is not guaranteed to choose 10 books in one pass -- it's not even guaranteed to choose any books on a given pass. So you have to re-run the second step until you've found 10 books. The @total variable retains its decreased value so you don't have to recalculate it.

First off I think your formula will guarantee that earlier books get picked. Try to set your initial probabilities based on:

Age - days since publication

Max(Age) - oldest book in the sample

Book Age(i) - age of book i

... Prob (i) = [Max (age) + e - Book Age (i)] / sum over all i [ Max (age) + e - Book age(i) ]

The value e ensures that the oldest book has some probability of being selected. Now that that is done, you can always recalc the prob of any sample.

Now you have to find an UNBIASED way of picking books. Probably the best way would be to calculate the cumulative distribution using the above then pick a uniform (0,1) r.v. Find where that r.v. is in the cumulative distribution and pick the book nearest to it.

Can't help you on the coding. Make sense?

继续阅读：algorithm php random sql

Making more recent items more likely to be drawn

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？