Keep relational database structure in solr index?
I was able to import data through solr DIH.
In my database I have 4 tables:
threads: id, user_id, country_id
tags: id
thread_tag_map: thread_id, tag_id
countries: id
posts: id, thread_id
I want each document in solr to consist of:
thread_id
tag_id
country_id
post_id
For example:
thread_id: 1
tag_id: 23
tag_id: 34
country_id: 43
post_id: 4
post_id: 23
post_id: 23
How should I map it?
I haven't been able to configure data-config.xml for this. I have followed the DIH tutorial without success.
Here is my schema.xml:
<schema name="example" version="1.2">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<fieldType name="text_rev" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.Wor开发者_运维技巧dDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="id" type="uuid" indexed="true" stored="true" default="NEW"/>
<field name="threads.title" type="text_rev" indexed="true" stored="true"/>
<field name="posts.body" type="text_rev" indexed="true" stored="true"/>
<dynamicField name="*id" type="int" indexed="false" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>posts.body</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
</schema>
It seems like you just want to define these fields:
thread_id
tag_id
country_id
post_id
as indexed 'string' fields in schema.xml. post_id should be multi-valued="true". See the default schema.xml files for formatting guidelines. Or...
http://wiki.apache.org/solr/SchemaXml
The only tricky thing here is actually querying the database, not configuring solr. Just write a JOIN query where you can get all of the ID's you need and use a solr client library for your language to build a simple datastruction, eg (json-y):
[{"thread_id":"1",
"tag_id":"14",
"country_id":"2",
"post_id":["5",
"7",
"18"
]
},...and more...]
Since Solr isn't a RDBMS, you'll have to fake your searches by either doing multiple queries or using subqueries. Another option might be using Solr to retrieve your thread or post with a full-text search, and then using an ID from there to run a MySQL query that will get you everything else you need.
精彩评论