Calculate estimated mathematical expectation of ranking data?
I have to sort a "most popular apps RSS" by app download 开发者_开发知识库count. Here is the problem:
Suppose there are 1,000 apps.
The RSS data lists top 100 apps for each app category.
The RSS data also provides top 100 app list regardless of category.
RSS is sorted by each app's download count, but exact download count number is unknown.
Each app has two known properties: category, and its position on the RSS ranking.
Now I want to sort all of the 1,000 apps by its estimated download count.
The sorting do not need to be very accurate, just statistically speaking most possible would be OK.
How could I implement this sorting algorithm? TIA.
you can process this way : (I assume each app belongs to only one category)
Let say you have the following ranking for each category C1 ..C10
C1 C2 ... C10
app1-1 app2-1
app1-2 &pp2-2
.. ...
app1-100 app2-100 app10-100
and
the overall 100 top apps classment (for example):
C app1-1 app1-2 app2 -1 ... app2-10
Now using these 2 tables, first you need to order C1 to C10 in the same order as app1-1 to app10-1 appear in list C, so you "know" (it's more like a guess) what Category is the more important in term of ranking.
Then use this information to sort the rest .
Now I'm gona use a simpler example to show how to order the rest of the elements.
let's take 3 categories and 12 apps .
C1 C2 C3
app1 app21 app31
app2 app22 app32
app3 app23 app33
app4 app24 app34
and C = app1 app2 app21 app31
1.first mark all elemnt in C in the table :
app1 app21 ->app31
| /
app2 app22 app32
app3 app23 app33
app4 app24 app34
2.second, sort the remaining elements
Since you don't have more information, a good approximation would be to look at each line from left to right (from larger ranked top list to smaller ranked top list) which gives :
app3 app22 app32 app4 app23 app33 app24 app34
then the overall classement would be :
app1 app2 app21 app31 app3 app22 app32 app4 app23 app33 app24 app34
I hope this example makes my ideas clear and that it helps.
I think this approach uses all the information you have in C1 ...C10 and C.
A simple way is to use the overall top-100 to determine which category to get the next app from.
In pesudo-code:
While (not finished)
i++
category = Overall_list(i).getCategory()
Overall_list.add(get next app from list for category)
end while
Any category(s) that have no entry in the overall top-100 will be added last.
Build a directed graph as follows:
- Each app is a node.
- If app X ranks above app Y on any of the lists, put an edge pointing from X to Y. Note: you really only need to add an edge if X is one rank higher than Y on any list.
- It might be possible that some of the #1-ranked items on the category lists don't appear on the total ranking list. In this case, I would add edges pointing from the lowest-ranked item on the total list to each of these items to make the graph connected.
Then, do a topological sort on the constructed graph. The resulting ordering will be guaranteed to be compatible with each individual top 100 list.
This approach will work even if an app appears in more than one category list -- assuming that the category lists are mutually consistent (e.g.: ranked according to total downloads and not, say, category downloads). For example, if you ever have a case where X is ranked above Y on one list but Y is ranked above X on another list, then this won't really work (and I'm not sure what would).
Without more information (e.g.: some kind of probability model), I can't really interpret what "statistically speaking most possible" really means.
精彩评论