开发者

How to convert PDF in HTML?

I know that some similar questions has been asked h开发者_开发问答ere, but I saw all of them and no one still satisfy me.

Well, I tried xpdf and pdftohtml both are great, but old and the new version of PDFs seems won't work.

My problem is to find a way that allow me convert any PDF or Doc to HTML and that keep the style and structure. If somebody has something, even paid its perfect.


Well, I tried some libraries, exclusively for Linux, but here is my intermediate conclusion.

PDFtoHTML is too old and doesn't take in consideration all new PDF Specifications, for example PDF 1.7 (mainly because it use xpdf 2.02, while xpdf is already in his 3 version)

Instead of PDFTOHTML I found Poppler that continues the PDFtoHTML development plus some new utils very useful. Actually, in Open Source Poppler was the one who rendered better my complex PDF. Here one almost equal I've to use.

Finally, here is it what I'm gonna use. ImageMagick + Poppler. I will convert my PDF to images and use the XML output from Poppler of PDFtoHTML to add a new layer on my image.


Like you i was on the search for an automatic conversion tool from PDF to HTML or even better XHTML. Well, it was only two sides, but after all http://www.pdfonline.com (Online PDF To HTML) did the best job for me. It even is able to filter and correctly display tables and paragraphes, not only phrases!

Still it was not enough for my job, so i generated a template file manually.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜