Replacing images in PDF documents with Python?
We generate PDF documents with RGB images stored in a CMS.
As part of the PDF processing we sometimes have the need to conver开发者_StackOverflow社区t the RGB images to CMYK (for print productions).
Converting the images from RGB to CMYK seems to be feasible with Python using LittleCMS and the PyLittleCMS bindings (plus the ICC color profiles for the RGB input and CMYK output device).
However is there some Python-based option to iterate over the images inside a PDF, extracting the image data and replacing them with the processed CMYK variants?
I don't think there's any free Python tools that do exactly what you want. Here are some options:
PoDoFo doesn't have mature Python bindings but can read and write PDFs, has support for PDF images and color spaces.
PDFMiner is a pure-Python PDF parser but it doesn't do much with images. This is a start, but would probably take quite a bit of work to do what you want.
The commercial version of ReportLab may be able to do what you want with PageCatcher; I haven't used it in a few years but you might investigate it. (The free ReportLab only writes PDFs, it doesn't read them.)
精彩评论