开发者

custom Analyzer using ASCIIFoldingFilter not replacing diacritics

We have an issue with a custom Lucene.NET Analyzer which uses ASCIIFoldingFilter and LowerCaseFilter.

While indexing our content, the lower case filter works and makes all terms low case but the ASCIIFoldingFilter leaves the diacritics untouched (there are no errors but characters like őŏő are not replaced with o, they are untouched and appear like this in the index - I would have expected this to work or fail not do nothing).

The relevant code is like this:

public TokenStream TokenStream(String fieldName, TextReader reader) {
  Tokenizer tokenizer = new StandardTokenizer(reader);
  TokenStream stream = new StandardFilter(tokenizer);
  stream = new ASCIIFoldingFilter(stream);
  return new LowerCaseFilter(stream);
}

Are there some additional steps that need to be performed to use the ASCIIFoldingFilter?

Is there some working Java example that I could adapt to Lucene.NET?

Thank you!

EDIT: I managed to fix this. It was a misconfiguration issue. The custom analyzer was not used, another one was used which 开发者_Go百科just did low case. The custom filter is now working correctly. Sorry!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜