开发者

Lucene Analyzer to Use With Special Characters and Punctuation?

I have a Lucene index that has several documents in it. Each document has multiple fields such as:

Id
Project
Name
Description

The Id field will be a unique identifier such as a GUID, Project is a user's ProjectID and a user can only view documents for their project, and Name and Description contain text that can have special characters.

When a user performs a search on the Name field, I want to be able to attempt to match the best I can such as:

First

Will return both:

First.Last 

and

First.Middle.Last

Name can also be something like:

Test (NameTest)

Where, if a user types in 'Test', 'Name', or '(NameTest)', then they can find the result.

However, if I say that Project is 'ProjectA' then that needs to be an exact match (case insensitive search). The same goes with the Id field.

Which fields s开发者_StackOverflowhould I set up as Tokenized and which as Untokenized? Also, is there a good Analyzer I should consider to make this happen?

I am stuck trying to decide the best route to implement the desired searching.


Your ID field should be untokenized for simple reason it does not appear it can be tokenized (whitespace based) unless you write your own tokenizer. You can Tokenize all your other fields.

Perform a phrase query on the project name, look up PhraseQuery or enclose your project name in double quotes (which will make it match exactly). Example: "\"My Fancy Project"\"

For the name field a simple query should work fine.

Unsure if there are situations where you want a combination of fields. In that situation look up BooleanQuery (which allows you to combine different queries boolean-ly)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜