Indexing PDF with page numbers with Solr
I'm indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. "term foo
was found in bar.pdf
on pages 2, 3 and 5."
Is it开发者_运维百科 possible to include page numbers in the query result like this?
It would require some development effort, but you could achieve this by indexing each page of each document as a seperate Solr document, and then use field collapsing to group the different page hits for each document.
Note that you need a nightly for this, field collapsing is not implemented in any currently released Solr version.
Also note: Field Collapsing is implemented in version Solr 3.3. More updates are expected in the next big version ( Solr 4.0)
精彩评论