How do I programmatically add rel="external" to external links in a string of HTML?
How can I check if links from 开发者_如何转开发a string variable are external? This string is the site content (like comments, articles etc).
And if they are, how do I append a external
value to their rel
attribute? And if they don't have this attribute, append rel="external"
?
A HTML parser is appropriate for input filtering, but for modifying output you'll need the performance of a simpleminded regex solution. In this case a callback regex would do:
$html = preg_replace_callback("#<a\s[^>]*href="(http://[^"]+)"[^>]*>#",
"cb_ext_url", $html);
function cb_ext_url($match) {
list ($orig, $url) = $match;
if (strstr($url, "http://localhost/")) {
return $orig;
}
elseif (strstr($orig, "rel=")) {
return $orig;
}
else {
return rtrim($orig, ">") . ' rel="external">';
}
}
You'll probably need more fine-grained checks. But that's the general approach.
Use an XML parser, like SimpleXML. Regex isn't made to do XML/HTML parsing, and here's a perfect explanation of what happens when you do: RegEx match open tags except XHTML self-contained tags.
Parse the input as XML, use the parser to select the required elements, edit their properties using the parser, and spit them back out.
It'll save you a headache, as regex makes me cry...
Here's my way of doing this (didn't test it):
<?php
$xmlString = "This is where the HTML of your site should go. Make sure it's valid!";
$xml = new SimpleXMLElement($xmlString);
foreach($xml->getElementsByTagName('a') as $a)
{
$attributes = $a->attributes();
if (isThisExternal($attributes['href']))
{
$a['rel'] = 'external';
}
}
echo $xml->asXml();
?>
It might be easier to do something like this on the client side, using jQuery:
<script type="text/javascript">
$(document).ready(function()
{
$.each($('a'), function(idx, tag)
{
// you might make this smarter and throw out URLS like
// http://www.otherdomain.com/yourdomain.com
if ($(tag).attr('href').indexOf('yourdomain.com') < 0)
{
$(tag).attr('rel', 'external');
}
});
});
</script>
As Craig White points out though, this doesn't do anything SEO-wise and won't help users who have JavaScript disabled.
精彩评论