开发者

regex to detect plain email AND mailto: link emails

My users use a CMS to enter job offers. In these job offers, sometimes the email address is in plain format (please contact job@job.com) or as an html mailto: link (<a href="mailto:job@job.com">jobline</a> and the even more annoying one <a href="mailto:job@job.com">job@job.com</a>).

I would like to build a php function that finds either format and make them spamproof by building an html string that tells humans what to do, and via javascript reconstruct a proper clickable mailto:link for javascript-enabled setups. It's the detection part that i have prob开发者_运维百科lem with.

The following works perfect for plain email. How can i adapt it to detect mailto: links too?

$addr_pattern = '/([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})(\((.+?)\))?/i';
preg_match_all($addr_pattern, $content, $addresses);
$the_addrs = $addresses[0];
for ($a = 0; $a < count($the_addrs); $a++) {
     $repaddr[$a] = preg_replace($addr_pattern, '<span title="$5" class="pep-email">$1(' . $opt_val . ')$2.$3</span>', $the_addrs[$a]);
 }
 $cc = str_replace($the_addrs, $repaddr, $content);

PS: this is to improve an existing wordpress plugin: Pixeline's Email protector. Winning answer's author will be dully credited in the plugin code, description and changelog.


It would be better to use the domdocument class to get the actual links as there are so many different possible ways to write them. You can also use it with a regex to scan the entire content to replace the text at the same time.

    // The content
$content = 'The stuff from the page';

// Start the dom object
$dom = new DOMDocument();
$dom->recover = true;
$dom->substituteEntities = true;

// Feed the content to the dom object
$dom->loadHTML($content);

// Check each link
foreach ($dom->getElementsByTagName('a') as $anchor) {
// Get the href
$href = $anchor->getAttribute('href');
// Check if it's a mailto link
if (substr($href, 0, 7) == 'mailto:') {
    # Do something with it
    $href = 'new link href';
}
// Put it back in the link
$anchor->setAttribute('href', $href);
}

// Replace the content with the new content
$content = $dom->saveHTML();


(<a href="mailto:|)([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})(">.+?</a>|)

This should match all variations then replace with $2

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜