Crop and extract text from PDF
I have cropped a PDF using the following command.
gswin32c.exe ^
-o cropped.pdf ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [64 418 348 803] /PAGE pdfmark" ^
-f original.pdf
The PDF is getting cropped. I used the following command to extract the 开发者_Go百科text from the cropped PDF.
gswin32c.exe ^
-q ^
-sFONTPATH=c:/windows/fonts ^
-dNODISPLAY ^
-dSAFER ^
-dDELAYBIND ^
-dWRITESYSTEMDICT ^
-dSIMPLE ^
-f ps2ascii.ps ^
-dFirstPage=1 ^
-dLastPage=1 ^
cropped.pdf ^
-> c:\output.txt ^
-dQUIET
The output contains the text of the original PDF and not the cropped PDF.
Can someone help to extract the text only from the cropped PDF.
Thanks Nazeer
The result you got is exactly what is to be expected.
Cropping of a PDF page does NOT mean: cut off everything around the cropped area and delete it.
Cropping means: do only display what's inside the cropped area (and zoom to it), and hide what's around it.
So when you convert such a page to text, you'll also get the hidden content back.
You may be more lucky, if you try a different means to convert the cropped.pdf to text:
Open it in Acrobat/Adobe Reader.
Click 'File --> Save as Text...'
精彩评论