How to improve search speeds in this situation?

2022-12-11 23:03 问答作者：

I have a search implemented on my site, it runs the following queries:

SELECT COUNT(mov_id) AS total_things 
FROM content
    WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1 
            AND ((con_title) LIKE ('%search keyword%') 
            OR soundex(con_title) LIKE soundex('search keyword')
            OR MATCH (con_title) AGAINST ('search keyword'));

+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+
| id | select_type | table  | type | possible_keys | key      | key_len | ref               | rows  | Extra       |
+----+-------------+--------+------+---------------+----------+-------开发者_如何学Python--+-------------------+-------+-------------+
|  1 | SIMPLE      | movies | ref  | con_type      | con_type | 12      | const,const,const | 11804 | Using where |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+

64058 Queries

Total time: 200817, Average time: 3.13492459958163

Taking 2 to 25 seconds to complete

Rows analyzed 1882 - 12104

SELECT 
            con_id, 
            con_title, 
            con_desc,
            MATCH (con_title) AGAINST ('search keyword') AS relevancy 
    FROM content
    WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1 
            AND ((con_title) LIKE ('%search keyword%') 
            OR soundex(con_title) LIKE soundex('search keyword')
            OR MATCH (con_title) AGAINST ('search keyword')) 
    ORDER BY relevancy DESC 
    LIMIT 0, 24;

+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
| id | select_type | table  | type | possible_keys | key      | key_len | ref               | rows  | Extra                       |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
|  1 | SIMPLE      | movies | ref  | con_type      | con_type | 12      | const,const,const | 11803 | Using where; Using filesort |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+

78321 Queries

Total time: 200657, Average time: 2.56198209930925

Taking 2 to 16 seconds to complete

Rows analyzed 0 - 15752

This basically works like a ghetto "fuzzy search" to ignore typos people might make.

Unfortunately, its very slow (even if I remove soundex() or FULLTEXT searching. How to improve search speeds in this situation?

The part of the WHERE clause that hurts is the first % after LIKE. To speed it up, you could normalize the keywords, moving them to a separate table:

table moviekeywords: movieid, keyword
table movies:        movieid, ...

This allows you to search through the moviekeywords table using an = condition, or at least like 'humphrey%'. Both variants can be made expremely fast with an index.

As long as you keep using soundex and LIKE(%nnn%) you will be running a full scan of all of an intermediate result. To illustrate this: If you omitted your other predicates (on con_status, con_incomplete AND con_type columns) you would always be running a full table scan.

I suggest dropping or scaling back your fuzzy predicates. For example, just running LIKE('nnn%') will be MUCH faster than %nnn% (if that column is indexed) but of course your search results will not be as fuzzy. Perhaps make soundex an advanced search option that does not always run.

If you can't compromise on any of those issues then at least make sure that your con_status, con_incomplete AND con_type columns are all indexed.

Think about Andomar's solution again - most keyword searches allow you to specify multiple keywords. You can't do that with your current query. And there's no problem with "The Terminator" - for that, you'd just add one keyword, "Terminator".

And with an index on the keyword column, it will be fast.

I made my "fuzzy search" a fallback option if COUNT on the original stricter query returns no results. My results have been pretty fast so far using

 SOUNDS LIKE ('blah')

So it looks like you only have around 15,000 rows. If you don't expect your table to grow past a hundred thousand entries or so, maybe you should just keep all the titles in memory and avoid hitting the database until you know which entries you want.

That is, at startup and at periodic intervals, just query all the titles out of the database, split each one into words, and keep a mapping of words to row keys. This should take less than 1MB of memory, accessing it should be quite fast, and most importantly you can add whatever fuzzy matching or heuristic scoring mechanisms you like (without modifying your schema).

Just a thought.

继续阅读：query-optimization

How to improve search speeds in this situation?

64058 Queries

Total time: 200817, Average time: 3.13492459958163

Taking 2 to 25 seconds to complete

Rows analyzed 1882 - 12104

78321 Queries

Total time: 200657, Average time: 2.56198209930925

Taking 2 to 16 seconds to complete

Rows analyzed 0 - 15752

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

64058 Queries

Total time: 200817, Average time: 3.13492459958163

Taking 2 to 25 seconds to complete

Rows analyzed 1882 - 12104

78321 Queries

Total time: 200657, Average time: 2.56198209930925

Taking 2 to 16 seconds to complete

Rows analyzed 0 - 15752

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？