Why does this Lucene.Net query fail?
I am trying to convert my search functionality to allow for fuzzy searches involving multiple words. My existing search code looks like:
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Various strings denoting the document fields available
};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
This works correctly, and if I have an entity with the name field of "My name is Andrew", and I perform a search for "Andrew Name", Lucene correctly finds the correct document. Now I want to enable fuzzy searching, so that "Anderw Name" is found correctly. I changed my method to use the following code:
const int MAX_RESULTS = 10000;
const float MIN_SIMILARITY = 0.5f;
const int PREFIX_LENGTH = 3;
if (string.IsNullOrWhiteSpace(searchString))
throw new ArgumentException("Provided search string is empty");
// Split the search into seperate queries per word, and combine them into one major query
var finalQuery = new BooleanQuery();
string[] terms = searchString.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
// Setup the fields to search
string[] searchfields = new string[]
{
// Strings denoting document field names here
};
// Crea开发者_开发知识库te a subquery where the term must match at least one of the fields
var subquery = new BooleanQuery();
foreach (string field in searchfields)
{
var queryTerm = new Term(field, term);
var fuzzyQuery = new FuzzyQuery(queryTerm, MIN_SIMILARITY, PREFIX_LENGTH);
subquery.Add(fuzzyQuery, BooleanClause.Occur.SHOULD);
}
// Add the subquery to the final query, but make at least one subquery match must be found
finalQuery.Add(subquery, BooleanClause.Occur.MUST);
}
// Perform the search
var directory = FSDirectory.Open(new DirectoryInfo(LuceneIndexBaseDirectory));
var searcher = new IndexSearcher(directory, true);
var hits = searcher.Search(finalQuery, MAX_RESULTS);
Unfortunately, with this code if I submit the search query "Andrew Name" (same as before) I get zero results back.
The core idea is that all terms must be found in at least one document field, but each term can reside in different fields. Does anyone have any idea why my rewritten query fails?
Final Edit: Ok it turns out I was over complicating this by a LOT, and there was no need to change from my first approach. After reverting back to the first code snippet, I enabled fuzzy searching by changing
finalQuery.Add(parser.Parse(term), BooleanClause.Occur.MUST);
to
finalQuery.Add(parser.Parse(term.Replace("~", "") + "~"), BooleanClause.Occur.MUST);
Your code works for me if I rewrite the searchString
to lower-case. I'm assuming that you're using the StandardAnalyzer
when indexing, and it will generate lower-case terms.
You need to 1) pass your tokens through the same analyzer (to enable identical processing), 2) apply the same logic as the analyzer or 3) use an analyzer which matches the processing you do (WhitespaceAnalyzer
).
You want this line:
var queryTerm = new Term(term);
to look like this:
var queryTerm = new Term(field, term);
Right now you're searching field term
(which probably doesn't exist) for the empty string (which will never be found).
精彩评论