开发者

Java PDF manipulation and rendering

I am hoping for this question to become a comprehensive guide to PDF manipulation and rendering in Java. I have a fairly comprehensive implementation by stitching together multiple open source libraries, I would like to improve upon it.

Background

My requirements and current implementation:

  1. Checking existing PDF documents for specific conditions (PDF version, password protection, font embedding, cross reference tables etc.) - Not implemented.
  2. Allow for the definition of Acroform fields via page co-ordinates or some other mechanism. - Not implemented
  3. Provide capability to iterate over form fields in a PDF, examine the field type and fill it with data - iText v 2.0.8
  4. Render the PDF to an image at different resolutions/DPI - two implementations (pdfrenderer and IcePDF)
  5. Render HTML/XHTML files to PDF - Flying Saucer xhtmlrenderer
  6. Do all the above as a library in a Java server environment (implying thread safety)

What do I not like

I am dissatisfied with the following:

  1. iText licensing: New versions of iText are under the AGPL license which is a non-starter for my project (and commercial projects in general?). The fee for the commercial license is non-trivial (spanning usage based pricing of a few cents a document to tens of thousands for site licenses) and if I am going to pay the license fees for the software, I would like to do a full market search for the best product. The 2.x versions of iText work OK, but there are enough bugs in there.
  2. PDF version conformance: There are strange conformance issues when it comes to font embedding, cross reference tables etc. across these libraries to cause a reasonable amount of grief.
  3. Rendering output quality: The quality of rendering to PNG from these files suffers from a few problems in the areas of embedded fonts, images and layers.

What I am hoping for

I am hoping to get some feedback from users and people who have researched PDF libraries. Please include as much of the following information as possible for completeness and posterity.

  • is your answer/comment based on use or research
  • name, version of the 开发者_JS百科library and license (if commercial license, please include cost if possible)
  • what do you use the library for
  • what do you like about it / what is it good with
  • what do you dislike about it / what is it not good with
  • what is your overall impression


Our BFO PDF Library at http://bfo.com can do most of that pretty easily - loading a PDF and determining its properties, creating, iterating over and populating form fields and rendering the PDF to a bitmap is all standard stuff. Converting from HTML or XHTML is a little trickier, but we have a companion product, the BFO Report Generator, which will do this with an XML syntax that's pretty similar to XHTML+CSS.

I'm not sure what you mean by "PDF Version Conformance" - if you're having specific issues you might want to expand on that, but otherwise I wouldn't get too hung up on the actual version number in PDF - with rare exceptions, PDFs features are pretty much backwards and forwards compatible (newer features are generally just ignored by readers that don't understand them).

Rasterizing PDF to a bitmap is a can of worms - doing it properly means writing your own Font and Image format parsers (a big job: Type 1 Fonts require a PostScript parser), and beating the square peg that is the PDF rendering model into the round hole that is the AWT model. It's also dependent on the PDF creation software doing the job properly. So whichever software you go for, if a file isn't rendering properly then email it to the support team - we're always after troublesome PDFs for our collections.

Our website has more info and a trial version for download, and if you want info on licensing costs just drop us a line.

Cheers... Mike (CTO @ BFO)


iText only costs you money if you actually make any money from the product you use it in. Which most people would consider fair. What are you comparing it against?

iText offers support through StackOverflow for non-paying users. And premium support for paying customers.


There is also ghostscript which can render pdf in various DPI

gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -r300x300 -sOutputFile=page_%d.png doc.pdf
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜