Pattern matching text in the body of a PDF and adding hyperlinks with PHP

2023-01-13 20:54 问答作者：

The situation is as follows: I have a series of big, fat PDF files, full of imagery and randomly distributed text - these are the sections of a huge promotional pricelist for a vast array of products. What I need is to pattern-match all the catalogue codes in the text of each PDF file and to wrap it with a hyperlink that will point to the respective page in an online store.

So the task is very simple - scan a PDF file for all plain-text 10 digits sequences, and convert those into links whose href is http://something?code=[match].

I would also prefer to put this together in a PHP script if possible, but any language would do. I have a gut feeling that maybe even flash could be an option.

Any ideas? Thanks in advance.

EDIT:

Some answers coming in are teaching me pcre syntax. The problem here is that I need to search and replace in a PDF file. So the problem is twofold. Say we'll do this in PHP:

How do you read / write to a PDF in PHP?
As PDFs aren't plaintext files, I can't just regex against them, and I also believe that PDF links are not bundled together with the text but come separate as regions. Which also means that I could maybe overlay an active rectangle over the coordinates of the catalogue code's characters, if I only knew where a matched code re开发者_StackOverflow社区sides on a page.

What do you think? Other languages are also an option.

Thanks.

Replacing text in a PDF is difficult and none of the open source PDF solutions support this capability.

Apago (www.apago.com) has a developed commercial solution for replacing text in PDF files. It's used by greeting card manufacturer to modify pricing, "MADE IN" text, product numbers, etc.

<?
$s="
http://something.com?code=3000 asdf text
http://something.com?code=5000 asdf
";
echo preg_replace('/(http:\/\/something\.com\?code=(\d+))/s', '<a href="$1">$2</a>',$s);
?>

output 3000 asdf text

5000 asdf

继续阅读：alivepdf pdf pdflib php regex

Pattern matching text in the body of a PDF and adding hyperlinks with PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？