开发者

Create a tiff with only text and no images from a postscript file with ghostscript

Is it possible to create a tiff file from a postscript-file (created from a pdf-document with readable t开发者_高级运维ext and images) into a tiff file without the images and only the text?

Like add a maxbuffer so images will be removed and only text remaining?

And if boxes and lines around text could be removed as well that would be awesome.

Best regards!


You can redefine the various 'image' operators so that they don't do anything:

/image {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/imagemask {
 type /dicttype eq not { % uses up argument, only one if dict form
   pop pop pop pop   % remove the arguments for the non-dictionary form.
 } ifelse
} bind def

/colorimage {
  type /integertype eq {
    pop                  % multi
    0 1 3 -1 roll {pop} for % one for each colour component
  } {
    pop pop pop
  } ifelse
} bind def

Save that as a file, and add the file to your GS invocation.

You can remove linework similarly by redefining the stroke operator:

/stroke {
  newpath
} bind def

rectstroke is harder, I suggest you read the PLRM if you need that one.

Possibly also the fill operator:

/fill {
  newpath
} bind def

/eofill {
  newpath
} bind def

Beware! Some text is not drawn using the text 'show' operators, but is constructed from linework, or drawn as images. These techniques will be defeated if you redefine the operators as shown above.

Note that the PDF interpreter often doesn't allow re-definition of operators, so you may first have to convert your PDF file to PostScript, using the ps2write device, then run the resulting file through GS to get a TIFF file.


gs -sDEVICE=bitrgbtags -o out.tags <myfile>

will create a ppm file with tags - tags label each pixel as text, vector, image etc.

Then you can use the C programs in ghostpdl/tools/GOT to process the image. It sounds like you want to write a new C program to to set each non text pixel to the background color or maybe just white, this is fairly straightforward with the example C programs in the GOT subdirectory as a guide (if you are a programmer). Then you would convert the ppm to tiff. Ken provided a different way of doing this that doesn't require pixel processing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜