regex to detect plain email AND mailto: link emails
My users use a CMS to enter job offers. In these job offers, sometimes the email address is in plain format (please contact job@job.com
) or as an html mailto: link (<a href="mailto:job@job.com">jobline</a>
and the even more annoying one <a href="mailto:job@job.com">job@job.com</a>
).
I would like to build a php function that finds either format and make them spamproof by building an html string that tells humans what to do, and via javascript reconstruct a proper clickable mailto:link for javascript-enabled setups. It's the detection part that i have prob开发者_运维百科lem with.
The following works perfect for plain email. How can i adapt it to detect mailto: links too?
$addr_pattern = '/([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})(\((.+?)\))?/i';
preg_match_all($addr_pattern, $content, $addresses);
$the_addrs = $addresses[0];
for ($a = 0; $a < count($the_addrs); $a++) {
$repaddr[$a] = preg_replace($addr_pattern, '<span title="$5" class="pep-email">$1(' . $opt_val . ')$2.$3</span>', $the_addrs[$a]);
}
$cc = str_replace($the_addrs, $repaddr, $content);
PS: this is to improve an existing wordpress plugin: Pixeline's Email protector. Winning answer's author will be dully credited in the plugin code, description and changelog.
It would be better to use the domdocument class to get the actual links as there are so many different possible ways to write them. You can also use it with a regex to scan the entire content to replace the text at the same time.
// The content
$content = 'The stuff from the page';
// Start the dom object
$dom = new DOMDocument();
$dom->recover = true;
$dom->substituteEntities = true;
// Feed the content to the dom object
$dom->loadHTML($content);
// Check each link
foreach ($dom->getElementsByTagName('a') as $anchor) {
// Get the href
$href = $anchor->getAttribute('href');
// Check if it's a mailto link
if (substr($href, 0, 7) == 'mailto:') {
# Do something with it
$href = 'new link href';
}
// Put it back in the link
$anchor->setAttribute('href', $href);
}
// Replace the content with the new content
$content = $dom->saveHTML();
(<a href="mailto:|)([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})(">.+?</a>|)
This should match all variations then replace with $2
精彩评论