开发者

How to use SOLR copyField directive

I have a rather simple SOLR structure, that hold three different fields:

id, text and tags

in the schema.xml I set the following

<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>

However, when I search a word that only appears as a tag, then the document is not found.

My question here is: does copyField happen before any analyzer runs (index and query) as described here or just before the query analyzer?


EDIT

th开发者_如何学Pythone analyzer def:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory" />              
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

and the field-type definitions (they are pretty much as the default configs):

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

and last the field definitions:

<fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" />
    <field name="text" type="text" indexed="true" stored="false" multiValued="true" />
    <field name="tags" type="text" indexed="false" stored="false" />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="AND"/>
<copyField source="tags" dest="text"/>


The copyField is done when a document is indexed, so it is before the index analyzer. It is really like you had put the same input text in two different fields. But after that, it all depends on the analyzers you defined for both fields.


If you search q=tags:xyz then xyz will not be found because you had sent it not be indexed.

If you do a default search, yes, it should search the copyfield, however, according to the Solr wiki

Any number of declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index

I think that having not added 'tags' to index would also cause the copyfield of 'tags' to not be indexed.


I haven't tried using the copyField to append additional text to an existing field. I suppose Solr could concatenate it, or add it as a second value.

But here's a couple ideas to try:

  1. Experiment with a document where the text field is blank, perhaps not even mentioned as a under the structure. Does it seem to make a difference when tags make it into the main text whether text starts out as totally blank or not?

  2. Declare a second field, call it text2. And then ALSO copy tags into text2 via a second copyField directive. This text2 field won't have anything else in it, presumably not even mentioned in your fields, so for sure it should get the content.

In both cases you'd check results with the schema browser, as before. I'd be very curious to hear how you find out!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜