开发者

Image Conversion library: Word, PDF, Excel to Images

We have a requirement to convert any incoming documents开发者_如何转开发 which are either in Excel, PDF and Word to images. Any recommendation?

I am NOT sure whether ImageMagik would do this but my understanding it is ONLY for format conversion of images and I guess handles PDF as well. What about Excel and Word?

Thanks in advance


You could convert everything to pdf first using:

$ libreoffice --headless --invisible --convert-to pdf *.libreofficeextension

and then use imagemagick...

you might have some formatting issues in word and especially in powerpoint


You're correct -- imagemagick won't handle the MS Office formats because it only handles image format conversion.

For PDFs, can just use imagemagick directly:

convert -density 400 filename.pdf filename.jpeg

It will give you files:

  • filename[0].jpg
  • filename[2].jpg
  • ...
  • filename[N-1].jpg

Where N was the number of pages in your document. pdf2ps will achieve the same thing, but you'll need to play around with the command-line parameters to get the same output quality.

For the MS Office products, I remember that there is some sort of API that allows you access to the suite's features (this was MS Office 2007, from memory), like opening a file and exporting it to PDF. If you can get things out to PDF, then you can use the method above to convert it to images. Some negative points:

  • This was many years ago at my previous job, and I can't remember what exactly it was called or how to use it.
  • I remember the output PDF formatting wasn't great (not 100% like it appears on the screen) but it readable. This may have improved since I last used it.
  • I have a vague recollection of it firing up an Excel window in the background, so it's not entirely a command-line solution (may be unsuitable for servers)


Quite old question still this is how I solved:

  1. use Windows machine
  2. Install MS Office suit
  3. Use https://officetopdf.codeplex.com/ for converting any office format to PDF
  4. Use Imagemagick for pdf to image format.

Hope it helps someone.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜