Lucene complex structure search

2022-12-28 09:33 问答作者：

Basically I do have pretty simple database that I'd like to index with Lucene. Domains are:

// Person domain
class Person {
  Set<Pair> keys;
}

// Pair domain
class Pair {
  KeyItem keyItem;
  String value;
}

// KeyItem domain, name is unique field within the DB (!!)
class KeyItem{
  String name;
}

I've tens of millions of profiles and hundreds of millions of Pairs, however, since most of KeyItem's "name" fields duplicates, there are only few dozens KeyItem instances. Came up to that structure to save on KeyItem instances.

Basically any Profile with any fields could be saved into that structure. Lets say we've profile with properties

- name: Andrew Morton
- eduction:  University of New South Wales, 
- country: Australia, 
- occupation: Linux programmer.

To store it,开发者_如何学Python we'll have single Profile instance, 4 KeyItem instances: name, education,country and occupation, and 4 Pair instances with values: "Andrew Morton", "University of New South Wales", "Australia" and "Linux Programmer".

All other profile will reference (all or some) same instances of KeyItem: name, education, country and occupation.

My question is, how to index all of that so I can search for Profile for some particular values of KeyItem::name and Pair::value. Ideally I'd like that kind of query to work:

name:Andrew* AND occupation:Linux*

Should I create custom Indexer and Searcher? Or I could use standard ones and just map KeyItem and Pair as Lucene components somehow?

I believe you can use standard Lucene methodology. I would:

Translate every profile to a Lucene Document.
Translate every Pair to a Field in this Document. All Fields need to be indexed, but not necessarily stored.
Add a stored Field with a profile id to the Document.
Search using name:value pairs similarly to your example.

If you choose bare Lucene, you will need a custom Indexer and Searcher, but they are not hard to build. It may be easier for you to use Solr, where you need less programming. However, I do not know if Solr allows an open-ended schema like the one I described - I believe you have to predefine all field names, so this may prevent you from using Solr.

Lucene returns the list of hit documents essentially based on the occurence of the keyword/s regardless of the type of query. The fundamental segment reader checks for the presence of keywords in the entire index database rather than in specific field of interest.

Suggest to introduce a custom searcher that performs the following.

1.Read the short-listed documents using the document id. ( I guess the collect() method may be overridden to pass the document id from search() of IndexSearcher class ).
2.Get the field value and check the presence of expected keywords.
3.Subject the document for scoring only if the document meets your custom criteria.

Note : The default standard searcher can be modified rather than writing a custom seacher from scratch.

继续阅读：compass-lucene lucene

Lucene complex structure search

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？