How to improve search speeds in this situation?
I have a search implemented on my site, it runs the following queries:
SELECT COUNT(mov_id) AS total_things
FROM content
WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1
AND ((con_title) LIKE ('%search keyword%')
OR soundex(con_title) LIKE soundex('search keyword')
OR MATCH (con_title) AGAINST ('search keyword'));
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+----------+-------开发者_如何学Python--+-------------------+-------+-------------+
| 1 | SIMPLE | movies | ref | con_type | con_type | 12 | const,const,const | 11804 | Using where |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-------------+
64058 Queries
Total time: 200817, Average time: 3.13492459958163
Taking 2 to 25 seconds to complete
Rows analyzed 1882 - 12104
SELECT
con_id,
con_title,
con_desc,
MATCH (con_title) AGAINST ('search keyword') AS relevancy
FROM content
WHERE con_status = 1 AND con_incomplete = 0 AND con_type = 1
AND ((con_title) LIKE ('%search keyword%')
OR soundex(con_title) LIKE soundex('search keyword')
OR MATCH (con_title) AGAINST ('search keyword'))
ORDER BY relevancy DESC
LIMIT 0, 24;
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
| 1 | SIMPLE | movies | ref | con_type | con_type | 12 | const,const,const | 11803 | Using where; Using filesort |
+----+-------------+--------+------+---------------+----------+---------+-------------------+-------+-----------------------------+
78321 Queries
Total time: 200657, Average time: 2.56198209930925
Taking 2 to 16 seconds to complete
Rows analyzed 0 - 15752
This basically works like a ghetto "fuzzy search" to ignore typos people might make.
Unfortunately, its very slow (even if I remove soundex() or FULLTEXT searching. How to improve search speeds in this situation?
The part of the WHERE clause that hurts is the first %
after LIKE
. To speed it up, you could normalize the keywords, moving them to a separate table:
table moviekeywords: movieid, keyword
table movies: movieid, ...
This allows you to search through the moviekeywords
table using an =
condition, or at least like 'humphrey%'
. Both variants can be made expremely fast with an index.
As long as you keep using soundex and LIKE(%nnn%) you will be running a full scan of all of an intermediate result. To illustrate this: If you omitted your other predicates (on con_status, con_incomplete AND con_type columns) you would always be running a full table scan.
I suggest dropping or scaling back your fuzzy predicates. For example, just running LIKE('nnn%') will be MUCH faster than %nnn% (if that column is indexed) but of course your search results will not be as fuzzy. Perhaps make soundex an advanced search option that does not always run.
If you can't compromise on any of those issues then at least make sure that your con_status, con_incomplete AND con_type columns are all indexed.
Think about Andomar's solution again - most keyword searches allow you to specify multiple keywords. You can't do that with your current query. And there's no problem with "The Terminator" - for that, you'd just add one keyword, "Terminator".
And with an index on the keyword column, it will be fast.
I made my "fuzzy search" a fallback option if COUNT on the original stricter query returns no results. My results have been pretty fast so far using
SOUNDS LIKE ('blah')
So it looks like you only have around 15,000 rows. If you don't expect your table to grow past a hundred thousand entries or so, maybe you should just keep all the titles in memory and avoid hitting the database until you know which entries you want.
That is, at startup and at periodic intervals, just query all the titles out of the database, split each one into words, and keep a mapping of words to row keys. This should take less than 1MB of memory, accessing it should be quite fast, and most importantly you can add whatever fuzzy matching or heuristic scoring mechanisms you like (without modifying your schema).
Just a thought.
精彩评论