How to directly search string in PDF using any programming language [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this questionIs it possible to search for a particular string in PDF using any programming language without converting it to a text or doc file. I want to search for a string directly without converting it, I tried to convert it to text and then search for the string but it gave me wrong result.
开发者_Python百科Thanks! Kim
Docotic.Pdf library can be used for your task. Please see my answer for similar question.
Disclaimer: I work for the company that develops Docotic.Pdf library.
1) Create your own PDF "parser":
http://www.quick-pdf.com/pdf-specification.htm
Probably could be minimal if you just need text data and not any of the formatting.
2) Find a library in your language of choice that can "natively" read .pdfs (tons of them out there).
3) use a pre-built tool (like pdf2text or pdfgrep): https://unix.stackexchange.com/questions/6704/grep-pdf-files
If your requirement is to search a for a word and replace it, you can go for Aspose.pdf.Kit
Poppler contains tools to extract text from a pdf document. Use it to search on documents.
In Java and C#, you can do that with iText, if the pdf file is not locked.
http://itextpdf.com/
精彩评论