开发者

How to generate non-prefix autocomplete suggestions?

I would like to add autocomplete to my tagging functionality.

A couple of questions:

  1. How do I generate a list of autocomplete suggestions that includes matches in both the prefix and the middle of string? For example, if the user type "auto", the autocomplete suggestions should include terms such as "autocomplete" and "build automation".

  2. I would like to allow multi-word tags and use comma (",") as a separator开发者_高级运维 for tags. So when the use hits the space bar, he is still typing out the same tag, but when he hits the comma key, he's starting a new tag. How do I do that?

I am using Django, jQuery, MySQL, and Solr. What is the best way to achieve to implement the above 2 features?


I've implemented exactly what you're asking about and it works really well. There's two issues to be aware of:

  • Highlighting in the results list summaries doesn't work, and the suggested workaround also doesn't work in this particular case.
  • If your documents have long titles and truncate them when displayed, there's a chance you'll be matching on the prefix of a word that's not being displayed. Several ways to handle this of course.
  • And in a future version, I'd like to give words towards the start of the title a bit more weight then words at the end. This would be one way to mitigate the previous item.

Like the previous answer, I'd start with the same article linked above, but you DO want the Edge NGram analyzer. The thing you'll add is to ALSO do whitespace tokenization.

And then you'd make these changes to your schema.xml file. This example assumes you already have a field called "title" defined, and it's what you'd like to display as well. I create a second field, which is ONLY used for autocomplete prefix matching.

Step 1: Define Edge NGram Text field type

<types>
  <!-- ... other types ... -->

  <!-- Assuming you already have this -->
  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    ... normal text definition ...
  </fieldType>

  <!-- Adding this -->
  <fieldType name="prefix_edge_text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- not using enablePositionIncrements="true" for now -->
      <filter class="solr.StopFilterFactory" words="stopwords.txt" />
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- No need to create Edges here -->
      <!-- Don't want stopwords here -->
    </analyzer>
  </fieldType>

</types>

Step 2: Define the New Field

<fields>
  <!-- ... other fields ... -->

  <!-- Assuming you already have this -->
  <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>

  <!-- Adding this -->
  <field name="prefix_title" type="prefix_edge_text" indexed="true" stored="true" multiValued="true" />

</fields>

Step 3: Copy the Title's content over to the prefix field during indexing

<!-- Adding this -->
<copyField source="title" dest="prefix_title" />

That's pretty much it for the schema. Just remember:

  • When you do a regular search, you still search against the regular title field.
  • When you're doing an autocomplete search, search against the prefix_title.


  1. Use the NGramTokenizerFactory. Use the analysis console to see how it works. Also see this article (but you would use NGram instead of EdgeNGram).
  2. Not sure what you mean by "tags" but I guess you have a multivalued field "tags", so your code would parse the input (splitting by ",") before sending the data to Solr.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜