Can we tell Solr/Lucene max chars to analyze for a search?
I have a problem that in my lucene index files one document can have huge text. now when i search one of these huge text documents lucene/solr does not filter any results even the search term exist in the document text. the reason that i think might be the large number of characters in document text? if yes than how could we tell solr/lucene how much characters to analyze during search, please explain
I am using Solr 1.4.1 c开发者_如何学Goan any
Thanks Ahsan
Lucene can handle huge documents without trouble. It seems unlikely that the document size itself is the problem. Use a tool like Luke to inspect the index and see what terms are associated with some of these large documents.
Also, have you changed the maxFieldLength setting in solrconfig.xml? I am testing out indexing the Bible, at 25 MB of data, and with a maxFieldLength of 10,000, which is the default, only the first 10,000 tokens ever get analysized, which leads to roughly ~2000 unique terms for my document.
If you are using Lucene directly, then there are a couple setting for maxFieldLength, you may have "unlimited" and therefore getting everything. Check the JavaDocs for how to set maxFieldLength.
精彩评论