How to configure SOLR so users can make prefix search by default?

2023-04-06 03:08 问答作者：

I am using SOLR 3.2. My application issues search queries on SOLR instance, for a text field type. How can i make SOLR to return results like "book", "bookshelf", "bookasd" so on, when user issues a query like "book". Should i append "*" characters to the query string manually or is there a setting in SOLR so it will do prefix searches on the field by default?

This is the schema.xml section for text field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
       开发者_如何学编程 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenat0All="1" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

There are several ways to do this, but performance wise you might want to use EdgeNgramFilterFacortory

I had the same requirement on a project. I had to implement Suggestion. What i did was defining this suggester fieldType

<fieldType class="solr.TextField" name="suggester">
    <analyzer  type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    </analyzer>
    <analyzer  type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

I used ShingleFilterFactory because I needed to get suggestion composed of one ore more words.

Then I used faceting queries to get suggestions.

Facet.Limit=10

Facet.Prefix="book"

Facet.Field="Suggester" //this is the field with fieldType="suggester" in which I saved the data

I know it uses facet results but maybe it solves your problem.

If my or Jayendra Patil's answer doesn't provide you a solution you can also take a look at EdgeNGramFilterFactory

You would either have to do the handling on the client side by appending the wildcard characters at the end of the search terms.

The impact :-

Wildcard queries have a performance impact
Wildcard queries do not undergo analysis. So the query time analysis won't be applied to you search terms

The other option is to implement custom query parser with the handling you need.

I'm sure you figured this out by now, but just so there's an answer here:

I handled this by taking the last term and putting an OR with the last term plus a wildcard, e.g. "my favorite book" becomes "my+favorite+(book OR book*)", and would return "my favorite bookshelf". You probably want to do some processing on the input anyway (escaping, etc).

If you are specifically looking for the text typed to match the beginning of the result, then edge n-grams are the way to go, but from reading your question it didn't seem you were really asking for that.

继续阅读：prefix search solr wildcard

How to configure SOLR so users can make prefix search by default?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？