PDF file compression
I have a requirement to dynamically generate and compress large batches of PDF files.
I am considering the usual algorithms
- Zip
- Ace
- Rar
Any other suggestion are welcome.
My question is which algorithm is likely to give me the smallest file size. Speed and efficency are also important factors but size is my primary concern.
Also does it make a difference whether I have many small files, or fewer 开发者_StackOverflow中文版larger files in each archive.
Most of my processing will be done in PHP, but I'm happy to interface with third party executables if needed.
Edit:
The documents are primarily invoices and shouldn't contain any other images except for the company logo
I have not had much success compressing PDFs. As pointed out, they are already compressed when composed (although some PDF composition tools allow you to specify a 'compression level'). If at all possible, the first approach you should take is to reduce the size of the composed PDFs.
If you keep the PDFs in a single file, they can share any common resources (images, fonts) and so can be significantly smaller. Note that this means one large PDF file, not one large ZIP with multiple PDFs inside.
In my experience it is quite difficult to compress the images within PDFs, and that images make by far the biggest impact on file size. Ensure that you have optimised images before you start. It is even worth running a test run without your images simply to see how much size the images are contributing.
The other component is font, and if you are using multiple embedded fonts then you are packing more data into the file. Just use one font to keep size down, or use fonts that are commonly installed so that you don't need to embed them.
I think 7z is the best currently with RAR being the second, but I would recommend you trying both to find out what works best for you.
LZMA is the best if you need smallest file size.
And of course PDF can be compressed itself.
I doubt you'll get much/any reduction in filesize by compressing PDFs. However, if all you're doing is collecting multiple files into one, why not tar
it?
We've done this in the past for large (and many) PDFs that store lots of text - Training Packages for Training Organisations in Australia. Its about 96% text (course info etc) and a few small diagrams. Sizes vary from 1-2Mb to 8 or 9Mb and they usually come in volumes of 4 or more.
We've found compressing with Zip OK to get good compression as the PDF format is already heavily compressed, it was more of a ease of use for our users to download it all as a batch instead of worry about the filesizes. To give you an idea, a 2.31Mb file - lots of text, several full page diagrams - compressed to 1.92Mb in ZIP and 1.90Mb in RAR.
I'd recommend using LZMA to get the best - looking at resource usage on compressing and uncompressing too.
How big are these files? Get a copy of WinRAR, WinAce and 7Zip and give it ago.
Combine my nifty tool Precomp with 7-Zip. It decompresses the zLib streams inside the PDF so 7-Zip (or any other compressor) can handle them better. You will get filesizes about 50% of the original size lossless. This tool works especially well for PDF files, but is also nice for other compressed (zLib/LZW) streams as ZIP/GZip/JAR/GIF/PNG...
For result examples have a look here or here. Speed can be slow for the precompression (PDF->PCF) part, but will be very fast for the recompression/reconstruction (PCF->PDF) part.
For even better results than with Precomp + 7-Zip, you can try lprepaq and prepaq variants, but beware, especially prepaq is slooww :) - the bright side is that prepaq offers the best (PDF) compression currently available.
精彩评论