开发者

PHP - Pdf files style validator

I need to go through a pdf file's source (preferably using php) to validate if it has certain margins, columns, text is separated in two columns of the same width + different other style validation rules. The file will be uploa开发者_如何学编程ded on a website and at upload, a validation message must show to the user, saying whether the file is valid or not.

At the link below are some of the rules to which the file must adhere: http://ifac.papercept.net/conferences/support/page.php

Could you please advise on how this could be done? Would it be possible to do such an application? I already have the website, I only need to implement the pdf validator.


I'm tempted to just laugh at your poor miserable existence at being handed such a task, but instead let me explain why what you want is all but impossible.

PDF doesn't define margins and columns and paragraph. It's more along the lines of "draw these characters at these coordinates". Transformation matrices, and color spaces and clipping regions, oh my!

There are some PDF libraries that will let you determine the location (bounding boxes really) of all the text drawing commands in a particular page. From that information, you'd have to determine if they're following all your layout requirements.

Margins wouldn't be so hard (build a bounding box around all the text then see if that box is within your margins), but columns are going to be considerably more difficult. Even impossible if someone's PDF generation program draws to both columns in one "draw some text" command:

(some text from column one           some text from column two) Tj

Presented with something like that (perfectly legitimate, but none to friendly to bbox analysis), you'd have to further break text boxes up based on the whitespace they contain.

Over all, a huge and painful process, and one you cannot promise will be 100% accurate, fraught with both false positives and false negatives.

Not.
Fun.

Libraries that give you that level of text info will generally also tell you what font, size, and color the given chunk of text uses.

Does such a library exists for PHP? I don't know. iText (Java or C#, AGPL or $) can determine text bounds, as can Adobe's ($$) libraries. I'm sure there are others.

I strongly recommend you look for some other way to enforce this guideline (like "people looking at the PDFs", or "everyone must submit as [some other format]".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜