How to use wildchards, fuzzy search with Solr?

2022-12-17 14:52 问答作者：

I use Solr for searching in my data and I recognized now that some of the solr search query language feature does not word for me. I miss these from the capabilities I have:

fuzzy search
wildchards * ? - I do not have stemming set up so far, this would be useful temporarily for searching
field specification - currently I cannot tell search in title:Blabla

As far as I know these th开发者_StackOverflowings should come by default in Solr, but I obviously don't have them. I use Solr 1.4. Here you can find my schema. Thanks for your help.

I googled for "solr fuzzy search" and I found your question here. Actually the version 4.0 of SOLR is capable of fuzzy search with a easy query syntax.

For example you can search for name:peter strict or with the tilde symbol name:peter~ as a fuzzy search. If you desire to restrict the fuzziness a little bit you can add a percentage in form of name:peter~0.7 ... this means you want to search for peter with a "sharpness" of 70%.

Your fieldType name="text" is missing a lot of filters. For reference, here's the text fieldType from the default schema.xml:

<!-- A text field that uses WordDelimiterFilter to enable splitting and matching of
    words on case-change, alpha numeric boundaries, and non-alphanumeric chars,
    so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi".
    Synonyms and stopwords are customized by external files, and stemming is enabled.
    -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

For example, the SnowballPorterFilterFactory is the one that enables stemming.

I recommend building your schema based on the default schema.xml, tweaking and modifying as necessary (as opposed to starting from scratch).

Here's the reference for analyzers, tokenizers and filters.

继续阅读：lucene solr

How to use wildchards, fuzzy search with Solr?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？