Google-alike search algorithm

2023-03-10 12:53 问答作者：

I am trying to implement a search algorithm in my simple datastructure. However, this is not a "HOW DO I DO THIS?"-question, but rather a "how could i optimize the algorithm?"

I am trying to keep a index of files, and each file can be associated with any number of tags (which acts like a category)

开发者_如何学运维

This is how my data is structured:

Entries:

 ------------------------------------
|  id  | description | short | score | 
 ------------------------------------

Tags:

 -------------
|  id  | text |
 -------------

EntryTags:

 -------------------
| entry_id | tag_id |
 -------------------

In the search field, the search request, will always be turned into single words split apart with a plus (+).

in the following example i will be searching for "blue+website+simple+layout"

- split searchterm up into array named t
- convert each word in array t into a number using the id from "Tags" table
- for each element in array t, select make new array for each element with "EntryTags" matching the search
- generate array A, where elements that are in all 4 arrays are put into
- generate array B, where elements that are in 3 of the 4 arrays are put into
- generate array C, where elements that are in 2 of the 4 arrays are put into
- generate array D with the last elemenets rest
- sort array A,B,C and D by the score parameter from the table
- output array A, then B, then C, then D

ofcourse this is not optimized or anything, but my lack of experience with more complex SQL is kicking my butt :(

In the end, all of this will be written in PHP and the mysqli library (and i will ofcourse keep the thread updated as i progress further)

You could use a kind of Bloom filter (at least this is part of Google's strategy). First you look up for entries with all of the entered tags. If you find nothing, try out all combinations with one tag missing, then with two tags missing ... until you have enough matches. The lookup in the Bloom filter is very fast, so it's okay to do many lookups.

Woah, let's keep it simple (KISS), that's just too complicated and not flexible.

How about this : With SQL, do one search for each search term, and include a column that puts in a 'point' value for the term's particular relevance. Sum up the searches on this 'point' value and find the results that have the biggest relevance via 'points'.

Check this out : http://www.jarrodgoddard.com/web-development/advanced-web-site-search-with-sql

SELECT title, filename, sum(relevance)
FROM (
SELECT title, filename, 10 AS relevance FROM page WHERE title like ‘%about%’
UNION
SELECT title, filename, 7 AS relevance FROM page WHERE filename like ‘%about%’
UNION
SELECT title, filename, 5 AS relevance FROM page WHERE keywords like ‘%about%’
UNION
SELECT title, filename, 2 AS relevance FROM page WHERE description like ‘%about%’
) results
GROUP BY title, filename
ORDER BY relevance desc;

继续阅读：algorithm php search-engine sql

Google-alike search algorithm

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？