I am trying to index using curl based request the request is curl \"http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true\" -F
I want to be able to create a new Tika parser to extract metadata from a file.We\'re already using Tika and the metadata extraction will be done consistently.
I\'ve a jsp web application with a custom search engine. The search engine is basically build on top of a \'documents\' table of a SQL Server database.
I am trying to scan all pdf/doc files in a directory. This works fine and I am able to scan all documents.
I am using Solr 3.1, Apache Tika 0.9 and Solrnet 0.3.1 to index the docuent like a .doc and .pdf file.
All the documentation I can find seems to suggest I can only extract the entire file\'s content. But I need to extract pages individual开发者_高级运维ly. Do I need to write my own parser for that? Is
I have configured Solr 3.1 with Apache tika 0.9 successfully I don\'t change Schema.xml(default schema) and solrconfig.xml file
can you give me the Steps to configure Tika 0.9 with Solr 3.1 &l开发者_如何学Ct;requestHandler name=\"/update/extract\"
I\'m using Apache Tika, and I have files (without ext开发者_如何学Goension) of particular content type that need to be renamed to have extension that reflect the content type.
I\'m getting this error when compiling Apache Tika the latest version on debian. Any help will be appreciated.