Need suggestion of alternative to Fulltext search
I am in need of a lightweight fast search solution.
Today I use Fulltext in boolean mode, where every searchword is mandatory in the results.
The function is fast, working and meets the requirements.
BUT some of the fulltext limitations, http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html, have appeared to be a problem. The site is on a hosted server and Im not allowed to change the mysql settings (e.g. minimum lenght)
E.g.
t开发者_JAVA技巧he search must be able to find red
, 11
and ab.cd
which todays full text solution can't.
http://sphinxsearch.com/ is what you're looking for
though you have to understand that smaller words you find the bigger indexes you use.
Use Lucene, it's very often implemented with MySQL and it'll be both faster and more featureful.
Using the built-in FTS engine is relatively bad practice, especially since it doesn't work with the slightly more reliable InnoDB engine.
The only thing that would come to mind, would to be basing your search off the number of occurrences you can find. Your actual index method could vary, depending on what the DB offers.
Assuming DB size isn't an issue, a (very) basic approach would be to break the search blobs (say, a post on stackoverflow) into each word, normalize it (remove plurals, strip 'logic' words such as and, etc.) then insert each word as a new record, together with the ID that identifies your indexed resource.
Count the instances of the ID, order by count, higher number = more relevant.
Not exactly my field though, so tred carefully! =]
I'd recommend you try distance searching: Levenshtein
Or search for "N-gram fulltext indexing".
I haven't mucked around with it, but I read the theory of full text searching (with mysql at least) a little while back.
If memory serves me correctly you can use full text search for what you want, but you need to configure (and I think a recompile) to get it to work on smaller number of search characters. I think it is set to a default number of 4 characters. You'll want to change it to 2 characters in length with a few other options thrown in and test the results you get.
Someone correct me if this is incorrect. I would rather not throw him on a red herring.
精彩评论