Avoid slow highlighting on Solr because of stemming

2023-03-24 07:14 问答作者：

I am quite new about using Solr, but would like to ask your help. I am developing an application which should be able to highlight the results of a query. For this I am using regex fragmenter:

<highlighting>
<fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
<lst name="defaults">
  <int name="hl.fragsize">500</int>
  <float name="hl.regex.slop">0.5</float>
  <str name="hl.pre"><![CDATA[<b>]]></str>
  <str name="hl.post"><![CDATA[</b>]]></str>
  <str name="hl.useFastVectorHighlighter">true</str>
  <str name="hl.regex.pattern">[-\w ,/\n\"']{20,300}[.?!]</str>
  <str name="hl.fl">dokumentum_syn_query</str>
</lst>

开发者_Go百科

The field is indexed with term vectors and offsets:

<field name="dokumentum_syn_query" type="huntext_syn" indexed="true" stored="true"   multiValued="true" termVectors="on" termPositions="on"  termOffsets="on"/>
<fieldType name="huntext_syn" class="solr.TextField" stored="true" indexed="true" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="com.morphologic.solr.huntoken.HunTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_query.txt" enablePositionIncrements="true" />
        <filter class="com.morphologic.solr.hunstem.HumorStemFilterFactory"
        lex="/home/oroszgy/workspace/morpho/solrplugins/data/lex"
        cache="alma"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_query.txt" enablePositionIncrements="true" />
      <filter class="com.morphologic.solr.hunstem.HumorStemFilterFactory"
        lex="/home/oroszgy/workspace/morpho/solrplugins/data/lex"
        cache="alma"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

The highlighting works well, excepts that its really slow. I realized that this is because the highlighter/fragmenter does stemming for all the result documents again.

Could you please help me why does it happen an how should I avoid this? (I thought that using fastvectorhighlighter will solve my problem, but it didn't)

The problem was, that I tried to use values "on" instead of "true". So the proper line on the schem is:

    <field name="dokumentum_syn_query" type="huntext_syn" indexed="true" stored="true"   multiValued="true" termVectors="true" termPositions="true"  termOffsets="true"/>

to avoid "slow" solr results by highlighting, i decided not to use the solr highlighting. I coded the highlighting functionality on client-side. That work's for me, but is ab bit tricky, because you have to handle the search-phrase at client side in the same way solr does on server side in order to find also the tokenized and stemmed terms on client-side - to mark, what solr was searched for and found. That means: you have to implement stemming functionality on client side.

Alternative:

I think, the term vector in the result sets gives you information about position of the term you have to highlight on the client side. You could use those information to highlight the terms on client side without implement stemmer on client. But i think: finally this is not really an alternative. Because Solr still needs to compute the position of the words - so you will not save time on server side.

继续阅读：highlighting solr stemming

Avoid slow highlighting on Solr because of stemming

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？