开发者

solr spellchecker

I have implemented the solr spellchecker based on the fieldType given here : http://wiki.apache.org/solr/SpellCheckingAnalysis The spellchecking is to be done for vendor names where suggestions should be given related to the search term entered. I have used copyField for the vendorName field of the above type i.e. textSpell I am getting weird collated results for some of my queries. e.g. 1) maccys does not give me any results where as maccy's gives me the desired result i.e. macy's. I compared the text analysis (admin tool) done for maccys & maccy's using both text & textSpell fieldtypes and both give macy as the endresult. So why is there no result returned from the spellchecker?

2) khols gives me 'shoes' the collated result where as the correct result 'kohls' is the third suggestion after (shoes & shops).

The onlyMorePopular flag is false and accuracy is the defau开发者_C百科lt of 0.5

Thanks in advance for any help. I am slightly lost in terms of debugging any further.


We have faced same problems for spellchecker producing weird results although we had a lot of data available. I cannot help how to debug it better, but I can tell you what we did:

  1. we are using a text field as it is - no whitespace or standard tokenizer! you can also add a shingle filter if you have less data to index not only "hello rabbit" but also "rabbit hello", but this will blow up the spellcheck index even more

     <fieldType name="txtspell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
        <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.TrimFilterFactory" />
            <filter class="solr.PatternReplaceFilterFactory"
            pattern="[\-\.\/\(\),]" replacement=""  replace="all"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="spellstopwords.txt"/>                       
            <!-- we don't want duplicates for one doc -->
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
         </analyzer>
     </fieldType>
    
  2. if you really need collation then (if you don't use shingle filter you'll need it) you can use solr from trunk where you can specify maxCollationTries=1 to be sure that the returned correction would produce some hits

  3. we use spellcheck.accuracy=0.7 (and onlyMorePopular=false)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜