Sorting of Field having special characters in SoLR
i am new at SoLR indexing. I want to sort location field which have different values.it also contains values which starts with 'sAmerica, #'Japan, %India and etc.
Now when i sort this field i do want to consider special characters like 's,'#,!,~ and etc. i want sorting which will ignore this chars and returns results like America at 1st position, %India at 2nd and #'Japan at 3rd position..
How to make it possbile? i am using PatternReplaceFilterFactory,but don't know about this.
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" c开发者_运维百科atenateWords="1" />
<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
</analyzer>
</fieldType>
IF you want to ignore the special characters, try using the following field type.
This would lower case the words and catenate the words excluding all special chars.
<fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
</analyzer>
</fieldType>
However, this would not work for 'sAmerica as s is not a special character.
<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
If this is fixed pattern you need to replace it before the word delimiter with above.
Edit -- Are you using this config ?
<fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
<filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
</analyzer>
</fieldType>
Have tested the following through analysis and it produces the following tokens -
KT - 'sAlgarve
LCF - 'salgarve
PRF - algarve
WDF - algarve
Can you check through the analysis.
精彩评论