开发者

Screen Scraping with .NET

I have around 100K scanned images [in pdf format/tif, jpg] from which data needs to be read and then uploaded to a hard drive. I am planning to come with a small application that will help to automate the data entry work.

Is there are free screen scraping开发者_运维百科 tool avaialable in the market that will help in automating the process.

What I thought initially was to read each image one by one and feed data through an application. But to see and then feed data one-by-one will definitely take some time and there are chances of human related error as well while reading the images.

All ideas / methods will be very helpful.

I need to provide some solution by start of next week.


Screen Scraping is downloading a webpage and extracting information from it.

To extract text from an image, you need to perform something called Optical Character Recognition or OCR for short. There are many software products available that will do this for you.


PDF files which are created by way of scanning or faxing have image content (it is a picture of the text). If your PDFs were created through a print driver from a text based application (Word printed as a PDF, by say "Bullzip", then it would have text content that could be 'scraped'. I have had a good experience with a previous version of PDFConverter, though there are other products that will do what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜