Writing my own HTML to PDF conversion library in PHP
I am planning to write my own library to convert (x)html to PD开发者_StackOverflow中文版F. I don't really know why I'm planning to take on such an incredibly tedious and complex task, I guess I need a good challenge. I assume I'll learn a lot too.
- What do I have to consider,
- where can I find information that gets me started,
- what are the possible pitfalls,
- ...
Well, yes. That's a difficult task. But here's a general advise anyway. It would be easiest to use FPDF as backend for generating the PDF. But if you want you can read up on the PDF specification http://www.quick-pdf.com/pdf-specification.htm - you should avoid the newer versions, use an older format (like PDF 1.2) that's easier to generate. The file format is quite diffuse but not complicated. The primary problem you will face with XHTML to PDF conversion is that PDF is Postscrips little stepsister and demands positioned output. PDF does not support flowtext as far as I remember. You have to break up HTML paragraphs and position words or sentences individually on each page. This requires knowledge of the used fonts to calculate widths and stuff.
Just a brainstormed list of things to consider:
- PDF's markup (if you can call it that) is very cryptic, so you're going to do a lot of funky string conversion
- PDF is based on the traditional concept of paper pages, so you'll have to think about page breaks, repeating headers and footers, page numbers, etc. Since HTML pages are not limited you'll have to find ways to dedect the best places for incerting breaks, etc.
- PDF is strictly nested as is xhtml (every tag has a closing tag) but html isn't, e.g.
<br>
,<img>
without the ending slash. this causes problems. you'll need to enforce strict xhtml or solve this problem otherwise - there is a commercial software princeXML, which I use to convert xhtml into PDF, they have a forum where you can learn a lot about the problems and technologies involved
- as Pekka commented: you're going to need this: http://www.adobe.com/devnet/pdf/pdf_reference.html
精彩评论