Lucene Index and Query Design Question - Searching People

2022-12-09 20:43 问答作者：

I have recently just started working with Lucene (specifically, Lucene.Net) and have successfully created several indicies and have no problem with any of them. Previously having worked with Endeca, I find that Lucene is lightweight, powerful, and has a much lower learning curve (due mostly to a concise API).

However, I have one specific index/query situation which I am having problems wrapping my head around. What I have is a person directory. People can be searched for in this application, with the goal of returning both exact and approximate matches. Right now, in the index I concatenate the "FirstName" and "LastName" into a single field called "FullName", adding a space between the two. So FirstName:Jon with LastName:Smith yield FullName:Jon Smith. I do anticipate the possibility of middle names and possibly suffix, but that is not important at the moment.

I would like to do the equivalent of a fuzzy search on the name, so someone searching for "John Smith" would still get back "Jon Smith". I had thought about a multisearch, however, this becomes more involved if his name was actually "Jon Del Carmen" or "Jon Paul Del Carmen". I have nothing in what the user types in to delineate the first name or last name pieces.

The only thought that I have is that I could replace spaces in the concatenated value with a character that would not be discarded. If I did this when I built the document for the index and also when I parsed the query, I could treat it as one larger word, right? Is there another way to do this that would work for both simple names ("Jon Smith") and also more complex names ("Jon Paul Del Carmen")?

Any advice would truly be appreciated. Thanks in advance!

Edit: Additional detail follows.

In Luke, I put in the following query:

FullName:jonn smith~

It is being parsed as:

FullName:jonn CreatedOn:s开发者_如何学JAVAmith~0.5

With an Explanation of:

BooleanQuery:boost=1.0000
    clauses=2, maxClauses=1024
    Clause 0: SHOULD
        TermQuery:boost=1.0000
            Term: field='FullName' text='jonn'
    Cluase 1: SHOULD
        FuzzyQuery: boost=1.0000
            prefixLen=0, minSimilarity=0.5000
            org.apache.lucene.search.FuzzyTermEnum: diff=-1.0000
            FilteredTermEnum: Exception null

"CreatedOn" is another Field in the index. I tried putting quotes around the term "jonn smith", but it then treats it like a phrasequery, instead. I am sure that the problem is that I am just not doing something right, but being so green at all of this, I am not sure what that something truly is.

My problem was with how I was building the index. What I ended up doing was making sure that it was not tokenizing the FullName, and the query started returning the correct results. The Explain results from above were due to an ID10T error on my part and is now returning correctly.

继续阅读：indexing lucene lucene.net

Lucene Index and Query Design Question - Searching People

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？