Luke Lucene BooleanQuery
In Luke, the following search expression returns 23 results:
docurl:www.siteurl.com docfile:Tomatoes*
If I pass this same expression into my C# Lucene.NET app with the following implementation:
IndexReader reader = IndexReader.Open(indexName);
Searcher searcher = new IndexSearcher(reader);
try
{
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max)
...
}
I get 0 results
Luke is using StandardAnalyzer and this is what the Explain Structure window looks like:
Must I manually create BooleanClause
objects for each field I search on, specifying Should
for each one then add them to the BooleanQuery
object with .Add()
? I thought the QueryParser
would do this for me. What am I missing?
Edit:
Simplifying a tad, docfile:Tomatoes*
returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST
to SHOULD
:
QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
BooleanQuery bquery = new BooleanQuery();
Query parsedQuery = parser.Parse(query);
bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
int _max = searcher.MaxDoc();
BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
TopDocs hits = searcher.Search(parsedQuery, _max);
parsedQuery is simply docfile:tomatoes*
Edit2:
I think I've finally gotten to the root problem:
开发者_运维问答 QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
Query parsedQuery = parser.Parse(query);
In the second line, query
is "docfile:Tomatoes*"
, but parsedQuery
is {docfile:tomatoes*}
. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.
I've verified that StandardAnalyzer
is being used when indexing and reading the index. How do I force queryParser
to keep the case of the value of query
?
Edit3: Wow, how frustrating. According to the documentation, I can accomplish this with:
parser.setLowercaseExpandedTerms(false);
Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not. Default is true.
I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.
Using Occur.MUST
is equivalent to using the +
operator with the standard query parser. Thus you code is evaluating +docurl:www.siteurl.com +docfile:Tomatoes*
rather than the expression you typed into Luke. To get that behavior, try Occur.SHOULD
when adding your clauses.
QueryParser
will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).
Your first step should be to attach a debugger and inspect the value and type of parsedQuery
.
精彩评论