Lucene .Net- What is a good method for creating index that is more complicated that key/value?

2023-03-23 03:44 问答作者：

I'm starting a project in which we are trying to index the contents of XML documents with Lucene .Net. In the little documentation I have found it seems that indexes can only consist of fileds with a single string value. The data that I am attempting to index is slightly more complicated than simple key value pairs.

Here is an example of an xml document I would want to generate an index from:

    <descriptor>
  <asset guid="2AA7C8F9-2CB1-4A81-9421-C09F1D85939E" generated-date="2011-07-30" generated-by="hw/AutoMfg" generated-with="PMS">

    <!-- information about where the asset can be used -->
    <target>
      <localization>en-us</localization>
      <localization>es-us</localization>
      <environment>desktop</environment>
      <environment>mobile</environment>
    </target>

    <!-- all contents of an asset must have the same version -->
    <version-information>
      <version-number source="content">9.1.123.4</version-number>
      <version-number source="manufacturing">9.1.123.4</version-number>
      <release-label>9.1</release-label>
    </version-information>

    <!-- catalog information about the primary role of the asset -->
    <role>
      <namespace>parent.type.family.some.thing</namespace>
      <mime-type>text/html</mime-type>
      <hwid>abc1234</hwid>
    </role>

  </asset>
</descr开发者_如何学Goiptor>

So I could see create fields named after the child elements of 'descriptor' but what about the child nodes there within? How can this data be indexed? Should I create a delimited string to represent the values of each fields?

eg field: "Target" Value:"localization: en-us;es-us environment: desktop;mobile | ...

Do I need to flatten my data out like in my example above to index it?

Thanks!

Kind of tricky to give specific advice -- so much of it revolves around what you want to retrieve and how rather than the shape of the data. In any case, I would start with Simone Chiaretta's excellent little series on lucene.net (1 2 3 4 5). One concept that will help alot is the fact that you can index the same field multiple times for a given document, so you'll probably make something like:

Target-Localization:en-us
Target-Localization:es-us
Target-Environment:desktop
Target-Environment:mobile

Lucene is fundamentally flat, but capable of being deep while being flat in new and interesting ways.

Take a look at Digester + Lucene. The .NET port of Digester is NDigester

继续阅读：full-text-search lucene lucene.net

Lucene .Net- What is a good method for creating index that is more complicated that key/value?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？