开发者

Searching with words one character long (MySQL)

I have a table Books in my MySQL database which has the columns Title (varchar(255)) and Edition (varchar(20)). Example values for these are "Introduction to Microeconomics" and "4".

I want to let users search for Books based on Title and Edition. So, for example they could enter "Microeconomics 开发者_如何学C4" and it would get the proper result. My question is how I should set this up on the database side.

I've been told that FULLTEXT search is generally a good way to do things like this. However, because the edition is sometimes just a single character ("4"), full text search would have to be setup to look at individual characters (ft_min_word_len = 1).. This, I've heard, is very inefficient.

So, how should I setup searches of this database?

UPDATE: I'm aware the CONCAT/LIKE could be used here.. My question is whether it would be too slow. My Books database has hundreds of thousands of books and a lot of users are going to be searching it..


here are the steps for solution

1) read the search string from user.

2) make the string in to parts according to space(" ") between the words.

3) use following query for getting the result

SELECT * FROM books WHERE Title LIKE '%part[0]%' AND Edition LIKE '%part[1]%';

here part[0] and part[1] are separated words from the given word

the PHP code for the above could be

<?php 
     $string_array=explode(" ",$string); //$string is the value we are searching
     $select_query="SELECT * FROM books WHERE Title LIKE '%".$string_array[0]."%' AND Edition LIKE '%".$string_array[1]."%';";
     $result=mysql_fetch_array(mysql_query($select_query));
?>

for $string_array[0] it could be extended to get all the parts except last one which can be applied for the case "Introduction to Microeconomics 4"


For your application, where you're interested in just title and edition, I suspect that using a FULLTEXT index with MATCH/AGAINST and reducing the ft_min_word_len to 1 would not have that much impact performance-wise (if you were data was more verbose or user written content, then I might hesitate).

The easiest way to check is to change the value, REPAIR the table to account for the new ft_min_word_len and rebuild the index, and do some simple benchmarking.

Having said that, for your application, I might consider looking into Sphinx. It's definitely going to be magnitudes faster, and your content is relatively static, so a delay between re-indexing (Sphinx's main drawback IMO) isn't an issue. Plus, with careful usage of the wordforms and exceptions, you could map things like 4/four/fourth/IV all to the same token for improved searching.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜