SolR : How to make a spellchecker not case sensitive but which returns the original word with upper case letters?

2023-03-27 01:12 问答作者：

I'm working on a SolR project to create a spellchecker.

Why if I type "britne" does it autocomplete "britney", but when I type "Britne" it doesn't find any result? Here is my field for spellchecking:

<fieldType name="suggestText" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" ignoreCase="true"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" ignoreCase="true"/>
  </analyzer>
</fieldType>

It has the LowerCaseFilterFactory in the query part AND in the index part, so I guessed it will conve开发者_如何学Pythonrt my query to lowerCase and compare withe the words stored in lowercase, but obviously not.

Moreover, I would like to have when I type "Britne", "britne" or "BriTnE" the result "Britney" (and not "britney"). How can I make my spellchecker not case-sensitive but returning "case-sensitive words"?

I'm not sure if it works, but maybe you could use copy fields for that:

Don't us the LowerCaseFilterFactory on your suggestText field, but use the LowerCaseFilterFactory on an 2nd field (let's call this) suggestText_lower. Than copy"field" this into the suggestText field.

So the "BriTnE" will be matched by typing "britne" without lowercasing the "suggestText" field.

You are confusing a few thing about indexes and storage here.

About storage, when you set stored=true, the value is stored 'as is' and doesn't reflect what's in the index exple:<field name="FIELDNAME" type="text" indexed="false" **stored="true"** multiValued="false" required="true" /> To check what has been stored, just make a simple : query displaying all fields.

Next, the indexes. Here you are processing (parsing & filtering) your values to make them searchable. For the same value, you may have to make multiple indexes to be able to make differrent kind of searches. Consider it seriously, that's often the best option. For the indexes, use the "Schema Browser" to inspect your indexed values (open the admin console, select your instance, and select the schema browser, then select the field you want to inspect and open "Load term info"). "copyField" is done for that and you have to store the value only once. There you'l see how it has been parsed and if really lowercased as you think: I already had some surprise here. If you index not you can try this tonkenizer <tokenizer class="solr.StandardTokenizerFactory"/> combined with the LowerCaseFilterFactory, this worked for me.

Last, your query is important too and probably the solution to your problem. When you search for Britne, you should build a search with a similarity feature (fuzzy search) or indicate you want it from the default search. You can try by searching Britne~ (same that Britne~0.5) or Britne~ or Britne~0.8 or whatever. You'll have to fine tune it for your need and context.

继续阅读：case-insensitive solr spell-checking

SolR : How to make a spellchecker not case sensitive but which returns the original word with upper case letters?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？