process the data of an image like pdf or something else using pdfcreator
hay all. maybe you guys开发者_运维技巧 can help me in my project. im using pdfcreator as a virtual printer to print to a file some images. can be pdf can be any type of image. but i need to extract data from it. can it be done? im using C#.
You cannot extract text from images.
In principle, you can extract text from PDFs.
Here are two methods using Free software commandline utilities; maybe one of them fits your needs:
pdftotext.exe
(part of Foolabs' XPDF utilities)gswin32c.exe
(Artifex' Ghostscript)
Example commandlines to extract all text from pages 3-7:
pdftotext:
pdftotext.exe ^
-f 3 ^
-l 7 ^
-epl dos ^
-layout ^
"d:\path with spaces\to\input.pdf" ^
"d:\path\to\output.txt"
You want to get the text output to stdout instead of a file? OK, try this:
pdftotext.exe ^
-f 3 ^
-l 7 ^
-epl dos ^
-layout ^
"d:\path with spaces\to\input.pdf" ^
-
Ghostscript:
(Check that your installation has ps2ascii.ps
in its lib subdirectory)
gswin32c.exe ^
-q ^
-sFONTPATH=c:/windows/fonts ^
-dNODISPLAY ^
-dSAFER ^
-dDELAYBIND ^
-dWRITESYSTEMDICT ^
-dSIMPLE ^
-f ps2ascii.ps ^
-dFirstPage=3 ^
-dLastPage=7 ^
"c:/path/to/input.pdf" ^
-dQUIET
Text output will appear on stdout. If you test this in a cmd.exe window, you can redirect this to a file by appending > /path/to/output.txt
to the command.
精彩评论