Linux PDF/Postscript Optimizing
So I have a report system built using Java and iText. PDF templates are created using Scribus. The Java code merge开发者_开发问答s the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them.
I use acroread to convert them to PS, then lpr the PS.
The FOSS application pdftops is horribly inefficient.
My main problem is that the PDF's generated using iText/Scribus are very large. And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents. (Adobe is painfully slow at updating stuff to 64 bit).
Now I can use Adobe reader on Windows, and use the Reduce file size option or whatever its called, and it greatly(> 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image.
My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.
To reduce the size of a PDF file, use pdfsizeopt, the software I am developing. pdfsizeopt
runs on Linux, Mac OS X, Windows (and possibly on other systems as well).
pdfsizeopt
has lots of dependencies, so it might be a bit cumbersome to install (about 10 minutes of your time). I'm working on making installation easier.
If you need something quickly, you can try one of its dependencies: Multivalent tool.pdf.Compress
, which is a pure Java tool.
Get Multivalent20060102.jar, install Java and run
java -cp Multivalent20060102.jar tool.pdf.Compress input.pdf
There are limitations on what gs -sDEVICE=pdfwrite
can do:
- it can't generate xref streams (so the PDF will be larger than necessary)
- it can't generate object streams (so the PDF will be larger than necessary)
- it doesn't deduplicate images or other objects (i.e., if the same image appears multiple times in the input PDF, gs makes a copy in the output for each occurrence)
- it emits images suboptimally
- it re-samples images to low resolution
- it sometimes omits hyperlinks in the PDF
- it can't convert some constructs (so the output PDF may be visually different from the input)
Neither pdfsizeopt
nor Multivalent's tool.pdf.Compress
suffer from these limitations.
gs \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/screen \
-dNOPAUSE \
-dBATCH \
-sDEVICE=pdfwrite \
-sOutputFile=output.pdf \
input.pdf
Ghostscript seems to work for most for this issue. I'm having a different problem now with ghostscript garbling the embedded fonts, but I'll open a new question for that.
精彩评论