开发者

How to configure the indexer so that "word1.word2" is considered as two words

supose a file 'test.txt' being indexed, the content of file is:

word1.word2

what should I do to make lucene consider "word1.word2" as two word开发者_开发问答s "word1" and "word2" not "word1.word2"


Lucene indexing with an analyzer will convert your words into Tokens of terms,(technically it converts the words into fields forming a document)

basically you can

1) create a StopAnalyzer and pass a HashSet with stop word as "."(period) this can have adverse effect on indexing(since you must use same analyzer while searching and indexing)

2) split the . with space and index them


That depends on which Analyzer you are using. The short generic answer would be to use a SimpleAnalyzer that uses a LetterTokenizer. The LetterTokenizer splits at any non-letter, thus including the dot character. If you have more specific tokenization requirements you must code a custom Analyzer class whose tokenStream method returns a custom TokenStream or Tokenizer object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜