开发者

How do I setup Lucene so that I can search ignoring whitespace characters?

For example, a list of part开发者_JAVA百科 numbers includes:

JRB-1000

JRB 1000

JRB1000

JRB100-0

-JRB1000

If a user searches on 'JRB1000', or 'JRB 1000' I would like to return a match for all the part numbers above.


Write a custom Analyzer that either splits these into several tokens (JRB, 1000; relatively easy and forgiving to users) or concatenates them into a single token (JRB1000; hard but precise). Implementing your own Analyzer amounts to overriding the tokenStream argument in an existing one and perhaps writing a custom TokenFilter class.

Apply your new Analyzer on both documents being indexed and queries.

(Links are for the Java version, but .NET should be similar.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜