PDF parsing specific text
hi I'm working on an app that parses out pdf data for viewing on mobile devices, I'm looking for a way to scan through a pdf file for specific text and getting the x & y coordinates of that text block. Is that even possible. I working on a Linux server, with 开发者_JS百科php but I'm flexible to use whatever means to get this working. Thanks.
Commercial options:
- TET (Text Extraction Toolkit) SDK from http://www.pdflib.com; Acrobat plug-in available for testing the mechanism
- pdfToolbox SDK from http://www.callassoftware.com; interactive desktop version available for testing
- if you are ready to do some more of the coding yourself: Adobe PDF Library, SDK, available through Datalogics
All are pretty mature, TET is very specific to text extraction, pdfToolbox is a general purpose SDK for analyzing and manipulating PDFs (but has a specific feature to do text extraction, with coordinates of text on the page), and Adobe PDF Library is rather a general purpose development tool (offers a lot of low level features, but code would have to be written that does find text/words/characters and pulls out the coordinates).
Disclaimer: I work for callas software, my view on pdfToolbox may be biased.
精彩评论