Wildcards in Lucene
Why does the wildcard query "dog#V*" fail to retrieve a document that contains "dog#VVP"?
The following code written in Jython for Lucene 3.0.0 fails to retrieve the indexed document. Am I missing something?
analyzer = WhitespaceAnalyzer()
directory = FSDirectory.open(java.io.File("testindex"))
iwriter = IndexWriter(directory, analyzer, True, IndexWriter.MaxFieldLength(25000))
doc = Document()
doc.add(Field("sentence", "dog#VVP", Field.Store.YES, Field.Index.ANALYZED))
iwriter.addDocument(doc)
iwriter.close()
directory.close()
par开发者_如何学Goser = QueryParser(Version.LUCENE_CURRENT, "sentence", analyzer)
directory = FSDirectory.open(java.io.File("testindex"))
isearcher = IndexSearcher(directory, True) # read-only=true
query = parser.parse("dog#V*")
hits = isearcher.search(query, None, 10).scoreDocs
print query_text + ":" + ", ".join([str(x) for x in list(hits)])
Output is:
dog#V*:
It doesn't return anything. I see the same behaviour for dog#VV* or with separators characters other than "#" (I tried "__" and "aaa"). Interestingly, the following queries work: dog#???, dog#*.
If you'd looked carefully at the result of
parser.parse("dog#V*")
you'd have seen
sentence:dog#v*
Note the lowercase v! To avoid the automatic lowercasing of terms in a wildcard query, you'll have to do
parser.setLowercaseExpandedTerms(False)
before parsing query strings. I have no idea why the default is to lowercase.
精彩评论