开发者

Denormalizing relational data for lucene/solr

I have an architectural question about using apache solr/lucene.

I'm building a solr index for searching a CV database. Basically every cv on there will have some fields like:

rate of pay, address, title

these fields are straight forward. The开发者_运维百科 area I need advise on is, skills and job history. For skills, someone might add an entry like: C# - 5 Years, Java - 9 Years

So there's essentially N number of skills, each with a string name and a int no of years. I was thinking I could use a dynamic field, *_skill, and possibly add them like so:

1_skill: C#, 2_skill: Java

But how can I index the years experience? would I then add a dynamic field like:

1_skill_years: 5, 2_skill_years: 9

Has anyone done similar things before? Any help greatly appreciated?

regards


Multi-valued fields maintain ordering so you could have a multi-valued field for skills and another one for years of experience. When you read them back, just associate them by their order.

Pay attention if you have a null or empty value. You will have to encode it using a special marker because en empty string or a null value will not be indexed and this will change the ordering.

UPDATE
Unfortunately, it is not possible in Solr to sort by a multi-valued field. See this link for explanations: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-td905943.html


Instead of a dynamic field, you could use multi-valued fields. You do know that the multiple values can exist for the same field. Hence something like

<Skill> Java
<Skill> Solr

etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜