开发者

Apache Solr: Correct use of CompoundWordFilter

I'm trying to figure out how to best configure Solr for my app. I'm indexing (mostly german) PDF-Documents, and I'm using dismax queries to query Solr.

If a document contains the word "Firmenprofil" (a german compound word, -> 'company profile'), it will only be returned in queries for exactly that word. However, it would be desirable for queries only containing "Profil" to also return this document.

I downloaded a german dictionary file and applied a DictionaryCompoundWordTokenFilter to both the index- and the query-analyzer.

The Problem is, that the filter decomposes the query into very small parts (e.g. "pro" in the case of "Firmenprofil" which then results in having all sorts of documents that contain words like "Product" returned...).

I tried removing the Fil开发者_如何学JAVAter from the query-analyzer which leads to solr not finding the document at all. I also tried leaving the query-filter in, but explicitly setting the onlyLongestMatch-option to true, but that didn't seem to have any effect at all.


Ok, seems like my dictionary file was simply too big (~20mb). I replaced it with a more compact one and now it works just fine...


Without your actual config files, its a bit of a guessing game.

Did you check if profil is part of the dictionary?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜