开发者

Stumbleupon type query

Wow, makes your head spin!

I am about to start a project, and although my mySql is OK, I can't get my head around what required for this:

I have a table of web addresses.

id,url
1,http://www.url1.com
2,http://www.url2.com
3,http://www.url3.com
4,http://www.url4.com

I have a table of users.

id,name
1,fred bloggs
2,john bloggs
3,amy bloggs

I have a table of categories.

id,name
1,science
2,tech
3,adult
4,stackoverflow

I have a table of categories the user likes as numerical ref relating to the category unique ref. For example:

user,category
1,4
1,6
1,7
1,10
2,3
2,4
3,5
.
.
.

I have a table of scores relating to each website address. When a user visits one of these sites and says they like it, it's stored like so:

url_ref,category
4,2
4,3
4,6
4,2
4,3
5,2
5,3
.
.
.

So based on the above data, URL 4 would score (in it's own right) as follows: 2=2 3=2 6=1

What I was hoping to do was pick out a random URL from over 2,000,000 records based on the current users interests.

So if the logged in user likes categories 1,2,3 then I would like to ORDER BY a score generated based on their interest.

If the logged in user likes categories 2 3 and 6 then the total score would be 5. However, if the current logged in user only like categories 2 and 6, the URL score would be 3. So the order by would be in context of the logged in users interests.

Think of stumbleupon.

I was thinking of using a set of VIEWS to help with sub queries.

I'm guessing t开发者_如何学编程hat all 2,000,000 records will need to be looked at and based on the id of the url it will look to see what scores it has based on each selected category of the current user.

So we need to know the user ID and this gets passed into the query as a constant from the start.

Ain't got a clue!

Chris Denman


What I was hoping to do was pick out a random URL from over 2,000,000 records based on the current users interests.

This screams for predictive modeling, something you probably wouldn't be able to pull off in the database. Basically, you'd want to precalculate your score for a given interest (or more likely set of interests) / URL combination, and then query based on the precalculated values. You'd most likely be best off doing this in application code somewhere.

Since you're trying to guess whether a user will like or dislike a link based on what you know about them, Bayes seems like a good starting point (sorry for the wikipedia link, but without knowing your programming language this is probably the best place to start): Naive Bayes Classifier

edit

The basic idea here is that you continually run your precalculation process, and once you have enough data you can try to distill it to a simple formula that you can use in your query. As you collect more data, you continue to run the precalculation process and use the expanded results to refine your formula. This gets really interesting if you have the means to suggest a link, then find out whether the user liked it or not, as you can use this feedback loop really improve the prediction algorithm (have a read on machine learning, particularly genetic algorithms, for more on this)


I did this in the end:

$dbh = new NewSys::mySqlAccess("xxxxxxxxxx","xxxxxxxxxx","xxxxxxxxx","localhost");

$icat{1}='animals pets';
$icat{2}='gadget addict';
$icat{3}='games online play';
$icat{4}='painting art';
$icat{5}='graphic designer design';
$icat{6}='philosophy';
$icat{7}='strange unusual bizarre';
$icat{8}='health fitness';
$icat{9}='photography photographer';
$icat{10}='reading books';
$icat{11}='humour humor comedy comedian funny';
$icat{12}='psychology psychologist';
$icat{13}='cartoons cartoonist';
$icat{14}='internet technology';
$icat{15}='science scientist';
$icat{16}='clothing fashion';
$icat{17}='movies movie latest';
$icat{18}="\"self improvement\"";
$icat{19}='drawing art';
$icat{20}='latest band member';
$icat{21}='shop prices';
$icat{22}='recipe recipes food';
$icat{23}='mythology';
$icat{24}='holiday resorts destinations';
$icat{25}="(rude words)";
$icat{26}="www website";

$dbh->Sql("DELETE FROM precalc WHERE member = '$fdat{cred_id}'");

$dbh->Sql("SELECT * FROM prefs WHERE member = '$fdat{cred_id}'");

@chos=();
while($dbh->FetchRow()){
    $cat=$dbh->Data('category');
    $cats{$cat}='#';
    }
foreach $cat (keys %cats){
    push @chos,"\'$cat\'";
    push @strings,$icat{$cat};
    }

$sqll=join("\,",@chos);
$words=join(" ",@strings);

$dbh->Sql("select users.id,users.url,IFNULL((select sum(scoretot.scr) from scoretot where scoretot.id = users.id and scoretot.category IN \($sqll\)),0) as score from users WHERE MATCH (description,lasttweet) AGAINST ('$words' IN BOOLEAN MODE) AND IFNULL((SELECT ref FROM visited WHERE member = '$fdat{cred_id}' AND user = users.id LIMIT 1),0) = 0 ORDER BY score DESC limit 30");

    $cnt=0;

    while($dbh->FetchRow()){

        $id=$dbh->Data('id');
        $url=$dbh->Data('url');
        $score=$dbh->Data('score');
        $dbh2->Sql("INSERT INTO precalc (member,user,url,score) VALUES ('$fdat{cred_id}','$id','$url','$score')");

        $cnt++;

        }

I came up with this answer about three months ago, and just cannot read it. So sorry, I can't explain how it finally worked, but it managed to query 2 million websites and choose one based on the history of a users past votes on other sites.

Once I got it working, I moved on to another problem!

http://www.staggerupon.com is where it all happens!

Chris

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜