开发者

Read PDF through Java and get the HTML Content

I want to read an existing PDF file, get not only the text, but also the format information 开发者_Python百科like: Font (Bold, Italic),paragraphs,images, tables. Basically I want to write an HTML similar to PDF.

Is there an code library for doing this? I am looking for an Open Source Library.

Regards, Tina Agrawal


Try the PDFBox or iText. They are open source, and can handle text, images ,tables, etc.


If you want an exact version of the page, you may need to create an image of the page and put invisble text on it. Can can see some idea of what is possible on our blog at http://www.jpedal.org/PDFblog/2012/08/4-ways-to-convert-pdf-to-html5/ with PDF to HTML conversion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜