开发者

In MySQL, if the movies table has a description field, how to do search on this description?

Say, using MySQL, if the movies table has 20,000 records, and each record ha开发者_如何学Pythons a field that is the description of the movie, up to 2k byte long. How can we search for movies with the word "nature" in its description? If possible, it is to be fast, instead of going through all the 20,000 records. (if in some other situations, like for books, where n can be 200,000 or more).


I wouldn't process the description column directly - per-row functions on selects rarely scale well. One of the guidelines I subscribe to is to never have to process things inside columns (like descriptions in your case, or parts of comma-separated-variable columns or even names (first/last) and address (street/town/state) components). If you're doing that, there's usually a more efficient way.

What I would do is to have insert, update and delete triggers on the table. For the insert/update triggers, I would populate another table along the lines of DescLookup below:

Movies:
    Id primary key
    Title
    Description
DescLookup:
    Word
    MovieId foreign key Movies(Id)
    Count
    primary key (Word,MovieId)
    index (MovieId)

Basically, for every non-noise word in the description (i.e., discount things like and, or, by, punctuation, single-letter words and so on), you get an entry in this table (with the lower-cased word).

Make sure that the trigger deletes all current rows for that MovieId before re-populating lest you be left with incorrect information.

Then you use that table to run your queries. This moves the "cost" of finding the words to the insert/update rather than every single select, amortising that cost considerably. This works well because the vast majority of databases are read far more often than written so moving the cost to the write part is a good idea.

Keep in mind there's extra storage required for this but, if you examine the large number of questions people ask about databases, "How can I do this fast?" far outweighs "How can I use less disk space?".

And the delete trigger will simply remove all entries in the DescLookup table with the relevant MovieId.

Because the Word column is indexed (and also, as you requested, you will not be searching every single description field), searches on it will be blindingly fast. That's because:

select MovieId from DescLookup where Word = 'nature';

will blow:

select Id from Movies where lower(Description) like '%nature%';

out of the water.


You want to use a full-text search index in this case. Be aware that there are some catches though, such as minimum word, length, stop-words etc.

The syntax for FTS looks like this:

WHERE MATCH (field) AGAINST ('text');
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜