PHP MYSQL search engine using keywords

2023-03-05 08:55 问答作者：

I am trying to implement search engine based on keywords search. Can anyone tell me which is the best (fastest) algorithm to implement a search for key words?

What I need is:

My keywords:

search, faster, profitable

Their synonyms:

search: grope, google, iden开发者_如何学运维tify, search   
faster: smart, quick, faster  
profitable: gain, profit

Now I should search all possible permutations of the above synonyms in a Database to identify the most matching words.

The best solution would be to use an existing search engine, like Lucene or one of its alternative ( see Which are the best alternatives to Lucene? ).

Now, if you want to implement that yourself (it's really a great and existing problem), you should have a look at the concept of Inverted Index. That's what Google and other search engines use. Of course, they have a LOT of additional systems on top of it, but that's the basic.

The idea of an inverted index, is that for each keyword (and synonyms), you store the id of the documents that contain the keyword. It's then very easy to lookup the matching documents for a set of keyword, because you just calculate an intersection (or an union depending on what you want to do) of their list in the inverted index. Example :

Let's assume that is your inverted index :

smart: [42,35]
gain: [42]
profit: [55]

Now if you have a query "smart, gain", your matching documents are the intersection (or the union) of [42, 35] and [42].

To handle synonyms, you just need to extend your query to include all synonyms for the words in the initial query. Based on your example, you query would become "faster, quick, gain, profit, profitable".

Once you've implemented that, a nice improvement is to add TFIDF weighting to your keywords. That's basically a way to weight rare words (programming) more than common ones (the).

The other approach is to just go through all your documents and find the ones that contain your words (or their synonyms). The inverted index will be MUCH faster though, because you don't have to go through all your documents every time. The time-consuming operation is building the index, which only has to be done once.

继续阅读：algorithm performance search

PHP MYSQL search engine using keywords

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？