开发者

algorithm for ranking search results based on previous usage

First and foremost, no I'm not asking please tell me how Google is built in two sentences. What I am asking is slightly different. I have a database filled with textual data that users input. We also give them the functionality to search for this data later. The problem is, we do a simple full text search now and return the results in any order. I'd like to return the results based on a weight, a weight of how often the user types in something. An an example a user might type in the following:

"foo" "bo" "bob" "bob" "bob" "bo" "foo2"

Based on the above d开发者_如何学编程ata, a search on 'b' should return bo and bob, but bob should be listed first. It is the most relevant based on usage.

Curious, what algorithm should I research to build this in an effective fashion? Any books based on common web algorithms (I know this isn't just web specific) out there that will explain this?


there is various search algorithms out there.

Here's a little guidepost to some of them: http://en.wikipedia.org/wiki/Search_algorithm

not an expert myself in this area, so I cannot recommend a specific one.


I don't know how you'd do this in the context of a database, but here's one way to go about it:

Use a trie to store each unique word and the count of how often it was used. When your user starts typing, the trie allows you to efficiently grab all the string with the given prefix, which you can then sort using the words' counts as keys.


We use apache solr for our search. In this technology, I think, this is normally done via boosting. So index your data and every day or so then boost individual documents based on user queries.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜