Denormalizing relational data for lucene/solr
I have an architectural question about using apache solr/lucene.
I'm building a solr index for searching a CV database. Basically every cv on there will have some fields like:
rate of pay, address, title
these fields are straight forward. The开发者_运维百科 area I need advise on is, skills and job history. For skills, someone might add an entry like: C# - 5 Years, Java - 9 Years
So there's essentially N number of skills, each with a string name and a int no of years. I was thinking I could use a dynamic field, *_skill, and possibly add them like so:
1_skill: C#, 2_skill: Java
But how can I index the years experience? would I then add a dynamic field like:
1_skill_years: 5, 2_skill_years: 9
Has anyone done similar things before? Any help greatly appreciated?
regards
Multi-valued fields maintain ordering so you could have a multi-valued field for skills and another one for years of experience. When you read them back, just associate them by their order.
Pay attention if you have a null or empty value. You will have to encode it using a special marker because en empty string or a null value will not be indexed and this will change the ordering.
UPDATE
Unfortunately, it is not possible in Solr to sort by a multi-valued field. See this link for explanations:
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-td905943.html
Instead of a dynamic field, you could use multi-valued fields. You do know that the multiple values can exist for the same field. Hence something like
<Skill> Java
<Skill> Solr
etc.
精彩评论