开发者

SolR : How to make a spellchecker not case sensitive but which returns the original word with upper case letters?

I'm working on a SolR project to create a spellchecker.

Why if I type "britne" does it autocomplete "britney", but when I type "Britne" it doesn't find any result? Here is my field for spellchecking:

<fieldType name="suggestText" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" ignoreCase="true"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" ignoreCase="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" ignoreCase="true"/>
  </analyzer>
</fieldType>

It has the LowerCaseFilterFactory in the query part AND in the index part, so I guessed it will conve开发者_如何学Pythonrt my query to lowerCase and compare withe the words stored in lowercase, but obviously not.

Moreover, I would like to have when I type "Britne", "britne" or "BriTnE" the result "Britney" (and not "britney"). How can I make my spellchecker not case-sensitive but returning "case-sensitive words"?


I'm not sure if it works, but maybe you could use copy fields for that:

Don't us the LowerCaseFilterFactory on your suggestText field, but use the LowerCaseFilterFactory on an 2nd field (let's call this) suggestText_lower. Than copy"field" this into the suggestText field.

So the "BriTnE" will be matched by typing "britne" without lowercasing the "suggestText" field.


You are confusing a few thing about indexes and storage here.

About storage, when you set stored=true, the value is stored 'as is' and doesn't reflect what's in the index exple:<field name="FIELDNAME" type="text" indexed="false" **stored="true"** multiValued="false" required="true" /> To check what has been stored, just make a simple : query displaying all fields.

Next, the indexes. Here you are processing (parsing & filtering) your values to make them searchable. For the same value, you may have to make multiple indexes to be able to make differrent kind of searches. Consider it seriously, that's often the best option. For the indexes, use the "Schema Browser" to inspect your indexed values (open the admin console, select your instance, and select the schema browser, then select the field you want to inspect and open "Load term info"). "copyField" is done for that and you have to store the value only once. There you'l see how it has been parsed and if really lowercased as you think: I already had some surprise here. If you index not you can try this tonkenizer <tokenizer class="solr.StandardTokenizerFactory"/> combined with the LowerCaseFilterFactory, this worked for me.

Last, your query is important too and probably the solution to your problem. When you search for Britne, you should build a search with a similarity feature (fuzzy search) or indicate you want it from the default search. You can try by searching Britne~ (same that Britne~0.5) or Britne~ or Britne~0.8 or whatever. You'll have to fine tune it for your need and context.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜