开发者

How do I programmatically add rel="external" to external links in a string of HTML?

How can I check if links from 开发者_如何转开发a string variable are external? This string is the site content (like comments, articles etc).

And if they are, how do I append a external value to their rel attribute? And if they don't have this attribute, append rel="external" ?


A HTML parser is appropriate for input filtering, but for modifying output you'll need the performance of a simpleminded regex solution. In this case a callback regex would do:

$html = preg_replace_callback("#<a\s[^>]*href="(http://[^"]+)"[^>]*>#",
     "cb_ext_url", $html);

function cb_ext_url($match) {
    list ($orig, $url) = $match;
    if (strstr($url, "http://localhost/")) {
        return $orig;
    }
    elseif (strstr($orig, "rel=")) {
        return $orig;
    }
    else {
        return rtrim($orig, ">") . ' rel="external">';
    }
}

You'll probably need more fine-grained checks. But that's the general approach.


Use an XML parser, like SimpleXML. Regex isn't made to do XML/HTML parsing, and here's a perfect explanation of what happens when you do: RegEx match open tags except XHTML self-contained tags.

Parse the input as XML, use the parser to select the required elements, edit their properties using the parser, and spit them back out.

It'll save you a headache, as regex makes me cry...


Here's my way of doing this (didn't test it):

<?php

$xmlString = "This is where the HTML of your site should go. Make sure it's valid!";

$xml = new SimpleXMLElement($xmlString);

foreach($xml->getElementsByTagName('a') as $a)
{
  $attributes = $a->attributes();

  if (isThisExternal($attributes['href']))
  {
    $a['rel'] = 'external';
  }
}

echo $xml->asXml();

?>


It might be easier to do something like this on the client side, using jQuery:

<script type="text/javascript">
    $(document).ready(function()
    {
        $.each($('a'), function(idx, tag)
        {
            // you might make this smarter and throw out URLS like 
            // http://www.otherdomain.com/yourdomain.com
            if ($(tag).attr('href').indexOf('yourdomain.com') < 0)
            {
                $(tag).attr('rel', 'external');
            }
        });
    });
</script>

As Craig White points out though, this doesn't do anything SEO-wise and won't help users who have JavaScript disabled.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜