开发者

Searching a single MySQL text column with fuzzy matching

I have a MySQL InnoDB table with a 'name' column (VARCHAR(255)) which I want users to be able to search against, returning all the matching rows. However, I can't just use a LIKE query because the search needs to allow for users typing in names which are similar to the available names (e.g. prefixing with 'The', or not knowing that the correct name includes an apostrophe).

Two examples are:

Name in DB: 'Rose and Crown'

Example possible searches which should match: 'Rose & Crown', 'Rose and Crown', 'rose and crown', 'The Rose and Crown'

Name in DB: 'Diver's Inn'

Example possible searches which should match: 'Divers' Inn', 'The Diver's Inn', 'Divers Inn'

I also want to be able to rank the results by a 'closest match' relevance, although I'm not sure how that would be done (edit distance perhaps?).

It's unlikely that the table will ever grow beyond a few thousand rows, so a method which doesn't scale to millions of rows is fine. Once entered, the name value for a given row will not change, so if an expensive indexing operation is required that's not a problem.

Is there an existing tool which will perform this task? I've looked a开发者_如何学运维t Zend_Search_Lucence but that seems to focus on documents, whereas I'm only interesting in searching a single column.

Edit: On SOUNDEX searching, this doesn't produce the results I want. For example:

SELECT soundex( 'the rose & crown' ) AS soundex1, soundex( 'rose and crown' ) AS soundex2;
soundex1    soundex2
T6265   R253265

Solution: In the end I've used Zend_Search_Lucence and just pretended that every name is in fact a document, which seems to achieve the result I want. I guess it's full text search in a way, even though each string is at most 3-4 words.


Full Text Search (FTS) is the terminology for the database functionality you desire. There's:

  • Native MySQL support (requires that the table be MyISAM)

    WHERE MATCH(column) 
            AGAINST('Rose', 'Crown') 
    
  • Sphinx (3rd party)

  • Lucene/SOLR (3rd party)


Here is a SO question that comes very close to what you want. While the answer is for PHP and MySQL, the general principle still applies:

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete?

Basically you would use SOUNDEX to get you what you want. If you need more power, longer strings, etc. you might want to look into Double Metaphone, which is an improvement over Metaphone and SOUNDEX:

http://aspell.net/metaphone/

http://www.atomodo.com/code/double-metaphone

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜