Fuzzy Indexes in Hibernate Search
I understand fuzzy searches all and well, but in my application they are very slow with lots of terms (~500ms). I ran across a solution to slow fuzzy searches where it was suggested that instead of doing fuzzy searches, index the terms with the levenstein algorithm, so that a regular keyword search would yield fuzzy results.
Is there any wa开发者_如何学JAVAy of doing this with Hibernate Search, preferably using annotations?
I am not quite sure what you want to do here. Do you want during indexing time insert words with a given Levenstein distance into the index? Similar to synonym search where you insert synonym tokens into the index? If so, you could just write your on token filter (and filter factory) and then use the @AnalyzerDef framework to build your custom analyzer. Look at the source code to see how this is done. Mind you, I see several issues with this approach. Indexing becomes expensive and the index size will become very big. Of course I don't know much more about your usecase.
I would try the following options, in order:
- Are you just trying to correct spelling errors in user queries? Maybe you should use a spellchecker/autosuggest up-front for this, rather than using slower fuzzy queries with hard-to-tune relevance.
- Is this not really a full-text search, but instead some type of 'matching' procedure? In this case, an alternative could be to index character n-grams instead, e.g. with lucene's ngram TokenFilters, so that you are doing a boolean query on the field instead of a slow fuzzy query. This is actually how lucene's spellchecker works behind the scenes anyway!
If the above don't apply, and you really decide you need fuzzy search, and there is no alternative, you could try using a nightly build of lucene's trunk instead. This uses a totally different algorithm so that these queries are much faster [1]. But, I don't think you will be able to easily integrate unreleased lucene trunk with hibernate.
[1]: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html Blog about fuzzy improvements.
精彩评论