Search in PDF, index it?
I have 1000+ PDF searchables.
I need some plugin or aplication to index it, such as 开发者_如何学JAVA (http) joomla.natemaxfield.com
We use Swish-e to index our website which includes thousands of PDF's, Word files and even WordPerfect files. It works great. It is free, open source and integrates well with PHP.
http://swish-e.org/index.html
From their homepage:
Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching.
Take a look at PDFMiner. It can do what you want quite easily. Also, please search for similar questions as this is a possible dupe of: Python module for converting PDF to text
精彩评论