开发者

Solr - Writing result of an Analyzer to different fields

I have read a couple of tutorials and browsed the Solr documentation. But one thing isn't clear to me. Let me explain:

Let's asume that the following document shall be indexed:

<doc>
  <field name="id">R12345</field>
  <field name="title">My title</field>
  <field name="content">My Content</field>
</doc>

Contrary to this document, the index should contain one extra field called "docType". This extra index field should be filled using a "completion rule". The idea behind this:

If id starts with character "R" then write the String "Resolve" into field docType in the index. If id starts with character "C" then write the String "Contribute" into field docType in the index.

The above document should be available开发者_Go百科 in the index with the following fields:

id=R12345
title=My Title
content=My Content
docType=Resolve

My idea is to use an Analyzer for this. The result of the Analyzer will then be written into field "id" in the index as usual (only a copy of the original text) but the result "Resolve" or "Contribute" should be written in another field.

My basic question is: How can this be achieved in teh Analyzer (Java snipped)? To make it more complex the index field "docType" should be searchable and must be available in the search result. How will the schema look like for field id and docType?

Thanks in advance Tobias


If you only need the indexed value, then the schema approach is sufficient. Create a new fieldtype that performs necessary processing, create a field of your new type, and set up a copy field to copy the value from id:

<fieldType name="doctypeField" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="([CR]).*" replacement="$1" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="C" replacement="Contribute" replace="all" />
    <filter class="solr.PatternReplaceFilterFactory" pattern="R" replacement="Resolve" replace="all" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="doctype" type="doctypeField" indexed="true" stored="false" required="false" />

<copyField source="id" dest="doctype"/>

You might want to note that you won't get a stored value from this. If you need that, then you should have the docType value figured out before feeding the document to Solr -- for instance by creating it in the SQL-query, if your content source is SQL, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜