开发者

Extract Data from .PDF files [duplicate]

This question already has answers here: 开发者_运维技巧 Extract Data from .PDF files (4 answers) Closed 8 years ago.

I need to extract data from .PDF files and load it in to SQL 2008. Can any one tell me how to proceed??

Extract Data from .PDF files [duplicate]


You will need to use a PDF library such as iTextSharp to extract the data from the PDF.

At this point, you have the data and can insert it into a database.


Text extraction works good with iText until you don't have a requirement to extract text from columns instead of rows (like Adobe Reader and Foxit Reader do when you copy the text from a PDF document. To extract text column by column the tool need to calculate a position and coordinates for text on a page

The commercial tool ByteScout PDF Extractor SDK capable of doing such text extraction with both row by row and column by column modes for text extraction (or can simply extract data as the structured XML)

DISCLAIMER: I work for ByteScout currently

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜