Programmatically determine the relative "popularities" of a list of items (books, songs, movies, etc)

2023-01-01 20:49 问答作者：

Given a list of (say) songs, what's the best way to determine their relative "popularity"?

My first thought is to use Google Trends. This list of songs:

Subterranean Homesick Blues
Empire State of Mind
California Gurls

produces the following Google Trends report: (to find out what's popular now, I restricted the report to the last 30 days)

http://s3.amazonaws.com/instagal/original/image001.png?1275516612

Empire State of Mind is marginally more popular than California Gurls, and Subterranean Homesick Blues is far less popular than either.

So this works pretty well, but what happens when your list is 100 or 1000 songs long? Google Trends only allows you to compare 5 terms at once, so absent a huge round-robin, what's the right approach?

Another option is to just do a Google Search for each song and see which h开发者_开发问答as the most results, but this doesn't really measure the same thing

Excellent question - one song by Britney Spears, might be phenomenally popular for 2 months then (thankfully) forgotten, while another song by Elvis might have sustained popularity for 30 years. How do you quantitatively distinguish the two? We know we want to think that sustained popularity is more important than a "flash in the pan", but how to get this result?

First, I would normalize around the release date - Subterranean Homesick Blues might be unpopular now (not in my house, though), but normalizing back to 1965 might yield a different result.

Since most songs climb in popularity, level off, then decline, let's choose the area when they level off. One might assume that during that period, that the two series are stationary, uncorrelated, and normally distributed. Now you can just apply a test to determine if the means are different.

There's probably less restrictive tests to determine the magnitude of difference between two time series, but I haven't run across them yet.

Anyone?

You could search for the item on Twitter and see how many times it is mentioned. Or look it up on Amazon to see how many people have reviewed it and what rating they gave it. Both Twitter and Amazon have APIs.

There is an unoffical google trends api. See http://zoastertech.com/projects/googletrends/index.php?page=Getting+Started I have not used it but perhaps it is of some help.

I would certainly treat Google's API of "restricted".

In general, comparison functions used for sorting algorithms are very "binary":

input: 2 elements
output: true/false

Here you have:

input: 5 elements
output: relative weights of each element

Therefore you will only need a linear number of calls to the API (whereas sorting usually requires O(N log N) calls to comparison functions).

You will need exactly ceil( (N-1)/4 ) calls. That you can parallelize, though do read the user guide closely as for the number of requests you are authorized to submit.

Then, once all of them are "rated" you can have a simple sort in local.

Intuitively, in order to gather them properly you would:

Shuffle your list
Pop the 5 first elements
Call the API
Insert them sorted in the result (use insertion sort here)
Pick up the median
Pop the 4 first elements (or less if less are available)
Call the API with the median and those 4 first
Go Back to Insert until your run out of elements

If your list is 1000 songs long, that 250 calls to the API, nothing too scary.

继续阅读：algorithm statistics

Programmatically determine the relative "popularities" of a list of items (books, songs, movies, etc)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？