开发者

ShingleFilter search with more terms than indexed phrase fails

I am using Solr 1.4.1 (lucene 2.9.3) on windows and am trying to understand ShingleFilter. I wrote the following code and find that if I provide more words than the actual phrase indexed in the field, then the search on that field fails i.e. no score contributed from that field with debugQuery=true.

Here is an example I created to reproduce, with field names and the document indexed:

Id: 1

title_1: Nina Simone

title_2: I put a spell on you

Issue the following Queries (dismax):

- “Nina Simone I put” <- Fails to have a score from title_1 search (using debugQuery)

- “Nina Simone” <- SUCCESS

Trying to analyze the above disparity, when I used Solr’s Field Analysis with the ‘shingle’ field (given below) and tried “Nina Simone I put”, it succeeds. So it’s only during the query that no score is provided. I also checked ‘parsedquery’ and it shows disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the title_1 field.

title_1 and title_2 fields are 开发者_StackOverflow社区of type ‘shingle’, defined as:

<fieldType name="shingle" class="solr.TextField" positionIncrementGap="100" indexed="true" stored="true">
  <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="false"/>
  </analyzer>
</fieldType>

Note that I also have a catchall field which is text. I have qf set to: 'id^2 catchall^0.8' and pf set to: 'title_1^1.5 title_2^1.2'

Is there something that I am missing or doing something wrong?


In a dismax query, the score of the query is the max of the subqueries. Not the sum. I don't really know much about how it sparse shingle queries, but if it does something like "(title1:(shingle1 shingle2...)) (title2:(shingle1 shingle2...))" then you should expect to see only one field contribute to the score.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜