开发者

Split A4 PDF page into two A5 and back again

I have a PDF with 开发者_如何转开发A4 pages. Each page contains two identical A5 pages for printing reasons. What I want to do in my Java program is to split these pages and use each unique A5 page zero to many times as a template to add/replace some text. After this is done I want to glue the A5 pages back again to A4 pages (for the same printing reasons).

An example: Use page one three times and page two one time.

  • Split the pages. (And throw away the identical right A5 pages)
  • Create three copies of the first page and one copy of the second page.
  • Add/replace the text.
  • Glue the pages together so that I get two A4 pages. The first one with the first two "page ones" and the second one with the third "page one" and the only "page two".

This should be possible? Shouldn't it? I'm thinking of using iText. But if anyone has any other recommendation I'm happy to change my mind about that.


A possibly less clunky solution for the record, using pdfjam-related bits. If test.pdf is an A4 landscape doc to be split into A5 portrait:

1) extract left half-pages

pdfcrop --bbox "0 0 421 595" --clip --papersize "a5" test.pdf test-left.pdf

Note: --bbox "<left> <bottom> <right> <top>" works in bp units

2) extract right half-pages:

pdfcrop --bbox "421 0 842 595" --clip --papersize "a5" test.pdf test-right.pdf

3) collate pages as desired, e.g.

pdfjoin test-left.pdf test-right.pdf "1" --outfile test-collated.pdf

4) reglue:

pdfnup --nup 2x1 test-collated.pdf --a4paper --outfile test-done.pdf


I once did something like that with camlpdf. In my case, I had a PDF where a physical A4 page consisted of two logical A5 pages and I wanted to get a normal PDF with A5 pages (i.e. where logical and physical page were the same).

This was in OCaml (camlpdf also exists for F#) and my code was the following:

let pdf = Pdfread.pdf_of_file None in_file ;;

let pdf =
  let (pdf,_perms) = Pdfcrypt.decrypt_pdf "" pdf in
  match pdf with
  | Some pdf -> pdf
  | None -> failwith "Could not decrypt"
;;

let pdf = Pdfmarks.remove_bookmarks pdf ;;

let pages = Pdfdoc.pages_of_pagetree pdf ;;

let pages = List.fold_right (fun page acc ->
  let (y1,x1,y2,x2) = Pdf.parse_rectangle page.Pdfdoc.mediabox in
  let box y1 x1 y2 x2 = Pdf.Array
    [ Pdf.Real y1; Pdf.Real x1; Pdf.Real y2; Pdf.Real x2 ]
  in
  let xm = x1 *. 0.5 +. x2 *. 0.5 in
  let pagel = {page with Pdfdoc.mediabox = box y1 x1 y2 xm}
  and pager = {page with Pdfdoc.mediabox = box y1 xm y2 x2}
  in pagel::pager::acc
) pages [] ;;

let pdf = Pdfdoc.change_pages false pdf pages ;;

Pdf.remove_unreferenced pdf ;;

Pdfwrite.pdf_to_file pdf out_file ;;

If iText offers similar abstractions, perhaps you can do something like this. The procedure is the following:

  1. Read and (optionally) decrypt the pdf
  2. Remove bookmarks (optional)
  3. Obtain the pages from the page tree
  4. Manipulate the pages: you can rearrange, duplicate and remove pages, and you can change their mediabox (bounding box); that should be enough for your purpose?
  5. Reconstruct the document with the new pages
  6. Remove unreferenced objects (like a garbage collect)
  7. Write out the resulting PDF


Try iText library http://itextpdf.com/. You can use existing pdf file for pattern, edit rotate and split existing documents. Usefull samples you can find here: http://www.1t3xt.info/examples/browse/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜