Lucene gotchas with punctuation
Whilst building some unit tests for my Lucene queries I noticed some strange behavior related to punctuation, in particular around parentheses.
What are some of th开发者_运维百科e best ways to deal with search fields that contain significant amounts of punctuation?
If you haven't customized the query parser, Lucene should behave according to the default query parser syntax. Are you getting something different than that? Do you want punctuation to have a special meaning or just to remove the punctuation from searches? The other usual suspect here is the Analyzer, which determines how your field is indexed and how the query is broken into pieces for searching. Can you post specific examples of bad behavior?
It is not not just parentheses, other punctuations such as the colon, hyphen etc. will cause issues. Here is a way to deal with them.
精彩评论