I am trying to index some PDF documents and then create a Search UI . This que开发者_Go百科stion is somewhat related to
Can anyone point me to a tutorial. My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to do to index pdfs.
Is it possible to index rich document (pdf, office)... with data import handler using solr cell. 开发者_StackOverflowI use solr 3.2.
I need to index content of doc/docx/pdf files uploaded by users and use Solr (1.4.1) ExtractingRequestHandler component (817165) for that. If that matters, I don\'t request indexing from it - the comp
I am trying to index using curl based request the request is curl \"http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true\" -F
can you give me the Steps to configure Tika 0.9 with Solr 3.1 &l开发者_如何学Ct;requestHandler name=\"/update/extract\"
I\'m indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. \"term foo was found in bar.pdf on pages 2, 3 and 5.\"
Can you use ExtractingRequestHandler and Tika with any of the compressed file formats (zip, tar, gz, etc) to extract the content out for indexing?
I\'m trying to get Solr to index a database in which one column is a filename of a PDF document I\'d like to index. My configuration looks like this:
At the end of the README.txt file which is located in the example directory under solr, I find this li开发者_JAVA百科ne: