How do I setup Lucene so that I can search ignoring whitespace characters?
For example, a list of part开发者_JAVA百科 numbers includes:
JRB-1000
JRB 1000 JRB1000 JRB100-0 -JRB1000If a user searches on 'JRB1000', or 'JRB 1000' I would like to return a match for all the part numbers above.
Write a custom Analyzer
that either splits these into several tokens (JRB
, 1000
; relatively easy and forgiving to users) or concatenates them into a single token (JRB1000
; hard but precise). Implementing your own Analyzer
amounts to overriding the tokenStream
argument in an existing one and perhaps writing a custom TokenFilter
class.
Apply your new Analyzer
on both documents being indexed and queries.
(Links are for the Java version, but .NET should be similar.)
精彩评论