Read PDF through Java and get the HTML Content
I want to read an existing PDF file, get not only the text, but also the format information 开发者_Python百科like: Font (Bold, Italic),paragraphs,images, tables. Basically I want to write an HTML similar to PDF.
Is there an code library for doing this? I am looking for an Open Source Library.
Regards, Tina Agrawal
Try the PDFBox or iText. They are open source, and can handle text, images ,tables, etc.
If you want an exact version of the page, you may need to create an image of the page and put invisble text on it. Can can see some idea of what is possible on our blog at http://www.jpedal.org/PDFblog/2012/08/4-ways-to-convert-pdf-to-html5/ with PDF to HTML conversion.
精彩评论