开发者

how to make lucene be case-insensitive

By default word "Word" and "word" are not th开发者_高级运维e same. How can I make Lucene be case-insensitive?


The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter documentation. You could also use Wildcard queries for case insensitive search since it bypasses the Analyzer.

You can store content in different fields to capture different case configurations if preferred.


The StandardAnalyzer applies a LowerCaseFilter that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter and QueryParser. E.g. a few line snippets:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);


In addition to using the StandardAnalyzer, which includes LowerCaseFilter and filters for common English words (such as "the"), you should also ensure you build your document using TextFields, not StringField which are for exact searches.


Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜