Solr SnowballPorterFilterFactory filter provides incorrect sugestions
I use SnowballPorterFilterFactory for index and query analyzers. Search for "apple" word. Solr successfully finds necessary articles, but tels that the word was spelled incorrect and give suggestion: "appl". It works correct if I search for "apples": no suggestion is given and found articles with "apple" word.
schema.xml:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms_en.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="开发者_如何学运维solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords_en.txt"/>
</analyzer>
</fieldType>
Any ideas how to exclude incorrect suggestions?
You should not use the same field for search & spellchecking... Add a field without stemming for spellchecking.
Example :
<!-- Basic Text Field for use with Spell Correction -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<!-- TextSpell -->
<field name="textSpelling" type="textSpell" indexed="true" stored="false" multiValued="true"/>
Then in your solrconfig.xml :
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">textSpelling</str>
<str name="termSourceField">textSpelling</str>
<str name="accuracy">0.7</str>
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="queryAnalyzerFieldType">text</str>
<str name="buildOnOptimize">true</str>
</lst>
</searchComponent>
精彩评论