开发者

How to directly search string in PDF using any programming language [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 5 years ago.

Improve this question

Is it possible to search for a particular string in PDF using any programming language without converting it to a text or doc file. I want to search for a string directly without converting it, I tried to convert it to text and then search for the string but it gave me wrong result.

开发者_Python百科Thanks! Kim


Docotic.Pdf library can be used for your task. Please see my answer for similar question.

Disclaimer: I work for the company that develops Docotic.Pdf library.


1) Create your own PDF "parser":

http://www.quick-pdf.com/pdf-specification.htm

Probably could be minimal if you just need text data and not any of the formatting.

2) Find a library in your language of choice that can "natively" read .pdfs (tons of them out there).

3) use a pre-built tool (like pdf2text or pdfgrep): https://unix.stackexchange.com/questions/6704/grep-pdf-files


If your requirement is to search a for a word and replace it, you can go for Aspose.pdf.Kit


Poppler contains tools to extract text from a pdf document. Use it to search on documents.


In Java and C#, you can do that with iText, if the pdf file is not locked.

http://itextpdf.com/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜