Extract Data from .PDF files [duplicate]
I need to extract data from .PDF files and load it in to SQL 2008. Can any one tell me how to proceed??
You will need to use a PDF library such as iTextSharp to extract the data from the PDF.
At this point, you have the data and can insert it into a database.
Text extraction works good with iText until you don't have a requirement to extract text from columns instead of rows (like Adobe Reader and Foxit Reader do when you copy the text from a PDF document. To extract text column by column the tool need to calculate a position and coordinates for text on a page
The commercial tool ByteScout PDF Extractor SDK capable of doing such text extraction with both row by row and column by column modes for text extraction (or can simply extract data as the structured XML)
DISCLAIMER: I work for ByteScout currently
精彩评论