Apache Solr: Correct use of CompoundWordFilter
I'm trying to figure out how to best configure Solr for my app. I'm indexing (mostly german) PDF-Documents, and I'm using dismax queries to query Solr.
If a document contains the word "Firmenprofil" (a german compound word, -> 'company profile'), it will only be returned in queries for exactly that word. However, it would be desirable for queries only containing "Profil" to also return this document.
I downloaded a german dictionary file and applied a DictionaryCompoundWordTokenFilter
to both the index- and the query-analyzer.
The Problem is, that the filter decomposes the query into very small parts (e.g. "pro" in the case of "Firmenprofil" which then results in having all sorts of documents that contain words like "Product" returned...).
I tried removing the Fil开发者_如何学JAVAter from the query-analyzer which leads to solr not finding the document at all. I also tried leaving the query-filter in, but explicitly setting the onlyLongestMatch
-option to true, but that didn't seem to have any effect at all.
Ok, seems like my dictionary file was simply too big (~20mb). I replaced it with a more compact one and now it works just fine...
Without your actual config files, its a bit of a guessing game.
Did you check if profil is part of the dictionary?
精彩评论