开发者

Sorting of Field having special characters in SoLR

i am new at SoLR indexing. I want to sort location field which have different values.it also contains values which starts with 'sAmerica, #'Japan, %India and etc.

Now when i sort this field i do want to consider special characters like 's,'#,!,~ and etc. i want sorting which will ignore this chars and returns results like America at 1st position, %India at 2nd and #'Japan at 3rd position..

How to make it possbile? i am using PatternReplaceFilterFactory,but don't know about this.

  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" c开发者_运维百科atenateWords="1"  />
    <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
  </analyzer>
</fieldType>


IF you want to ignore the special characters, try using the following field type.
This would lower case the words and catenate the words excluding all special chars.

    <fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
        </analyzer>
    </fieldType>

However, this would not work for 'sAmerica as s is not a special character.

<filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />

If this is fixed pattern you need to replace it before the word delimiter with above.

Edit -- Are you using this config ?

<fieldType name="string_sort" class="solr.TextField" positionIncrementGap="1">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="'s" replacement="" replace="all" />
        <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" />
    </analyzer>
</fieldType>

Have tested the following through analysis and it produces the following tokens -

KT - 'sAlgarve
LCF - 'salgarve
PRF - algarve
WDF - algarve

Can you check through the analysis.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜