solr spellchecker

2023-01-23 03:50 问答作者：

I have implemented the solr spellchecker based on the fieldType given here : http://wiki.apache.org/solr/SpellCheckingAnalysis The spellchecking is to be done for vendor names where suggestions should be given related to the search term entered. I have used copyField for the vendorName field of the above type i.e. textSpell I am getting weird collated results for some of my queries. e.g. 1) maccys does not give me any results where as maccy's gives me the desired result i.e. macy's. I compared the text analysis (admin tool) done for maccys & maccy's using both text & textSpell fieldtypes and both give macy as the endresult. So why is there no result returned from the spellchecker?

2) khols gives me 'shoes' the collated result where as the correct result 'kohls' is the third suggestion after (shoes & shops).

The onlyMorePopular flag is false and accuracy is the defau开发者_C百科lt of 0.5

Thanks in advance for any help. I am slightly lost in terms of debugging any further.

We have faced same problems for spellchecker producing weird results although we had a lot of data available. I cannot help how to debug it better, but I can tell you what we did:

we are using a text field as it is - no whitespace or standard tokenizer! you can also add a shingle filter if you have less data to index not only "hello rabbit" but also "rabbit hello", but this will blow up the spellcheck index even more

 <fieldType name="txtspell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory"
        pattern="[\-\.\/\(\),]" replacement=""  replace="all"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="spellstopwords.txt"/>                       
        <!-- we don't want duplicates for one doc -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
 </fieldType>

if you really need collation then (if you don't use shingle filter you'll need it) you can use solr from trunk where you can specify maxCollationTries=1 to be sure that the returned correction would produce some hits
we use spellcheck.accuracy=0.7 (and onlyMorePopular=false)

继续阅读：solr spell-checking

solr spellchecker

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？