开发者

Search in PDF, index it?

I have 1000+ PDF searchables.

I need some plugin or aplication to index it, such as 开发者_如何学JAVA (http) joomla.natemaxfield.com


We use Swish-e to index our website which includes thousands of PDF's, Word files and even WordPerfect files. It works great. It is free, open source and integrates well with PHP.

http://swish-e.org/index.html

From their homepage:

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. Swish-e is also often used to supplement databases like the MySQL® DBMS for very fast full-text searching.


Take a look at PDFMiner. It can do what you want quite easily. Also, please search for similar questions as this is a possible dupe of: Python module for converting PDF to text

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜