开发者

How to mimic Stack Overflow Auto-Link Behavior

With PHP how can I mimic the auto-link behavior of Stack Overflow (which BTW is awesomely cool)?

For instance, the following URL:

http://www.stackoverflow.com/qu开发者_StackOverflowestions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Is converted into this:

<a title="how to mimic stackoverflow auto link behavior" rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/…</a>

I don't really care for the title attribute in this case.


And this:

http://pt.php.net/manual/en/function.base-convert.php#52450

Is converted into this:

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/…</a>

How can I make a similar function in PHP?

PS: Check my comments on this question for some more examples and behaviors.


Try this out. The URL-matching regex pattern is from Daring Fireball.

/**
 * Replace links in text with html links
 *
 * @param  string $text
 * @return string
 */
function auto_link_text($text)
{
   // a more readably-formatted version of the pattern is on http://daringfireball.net/2010/07/improved_regex_for_matching_urls
   $pattern  = '(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

   $callback = create_function('$matches', '
       $url       = array_shift($matches);
       $url_parts = parse_url($url);

       $text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
       $text = preg_replace("/^www./", "", $text);

       $last = -(strlen(strrchr($text, "/"))) + 1;
       if ($last < 0) {
           $text = substr($text, 0, $last) . "&hellip;";
       }

       return sprintf(\'<a rel="nofollow" href="%s">%s</a>\', $url, $text);
   ');

   return preg_replace_callback($pattern, $callback, $text);
}

Input Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

 Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

Output Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
 Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/&hellip;</a>

 Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/&hellip;</a>

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">pt.php.net/manual/en/&hellip;</a>


This is based on the same daringfireball.net regular expression, but adds a bit more logic than Eric Coleman's example, as well as configuration for maximum URL depth (SO seems to be 50), maximum path depth when URL is truncated (SO seems to be 2), and ellipsis character (&hellip;).

As far as I know this replicates all of the SO URL rewriting functionality, at least as far as what was discussed so far in the comments and responses here.

function auto_link_text($text) {
    $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
    return preg_replace_callback($pattern, 'auto_link_text_callback', $text);
}

function auto_link_text_callback($matches) {
    $max_url_length = 50;
    $max_depth_if_over_length = 2;
    $ellipsis = '&hellip;';

    $url_full = $matches[0];
    $url_short = '';

    if (strlen($url_full) > $max_url_length) {
        $parts = parse_url($url_full);
        $url_short = $parts['scheme'] . '://' . preg_replace('/^www\./', '', $parts['host']) . '/';

        $path_components = explode('/', trim($parts['path'], '/'));
        foreach ($path_components as $dir) {
            $url_string_components[] = $dir . '/';
        }

        if (!empty($parts['query'])) {
            $url_string_components[] = '?' . $parts['query'];
        }

        if (!empty($parts['fragment'])) {
            $url_string_components[] = '#' . $parts['fragment'];
        }

        for ($k = 0; $k < count($url_string_components); $k++) {
            $curr_component = $url_string_components[$k];
            if ($k >= $max_depth_if_over_length || strlen($url_short) + strlen($curr_component) > $max_url_length) {
                if ($k == 0 && strlen($url_short) < $max_url_length) {
                    // Always show a portion of first directory
                    $url_short .= substr($curr_component, 0, $max_url_length - strlen($url_short));
                }
                $url_short .= $ellipsis;
                break;
            }
            $url_short .= $curr_component;
        }

    } else {
        $url_short = $url_full;
    }

    return "<a rel=\"nofollow\" href=\"$url_full\">$url_short</a>";
}

Sample Input:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

http://a.b/c/d/e/f/test

and http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test

Sample Output:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">http://stackoverflow.com/questions/1925455/&hellip;</a> 

Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">http://pt.php.net/manual/en/&hellip;</a> 

<a rel="nofollow" href="http://a.b/c/d/e/f/test">http://a.b/c/d/e/f/test</a> 

and <a rel="nofollow" href="http://a.b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/z/y/w/z/test">http://a.b/c/d/&hellip;</a>


This will convert the sample string to what you are after. I left out title as that comes from a different source than just a standalone URL and you said that was not important.

<?php
$urlInput="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior";
preg_match('@http://(?:www\.)?(\S+/)\S*(?:\s|$)@i', $urlInput, $matches);
print('<a rel="nofollow" href="' . trim($matches[0]) . '">' . $matches[1] . '...</a>');
?>

Extend as needed to scan through your text.

If you want to match just a certain number of URL path elements, use this RE:

'@http://(?:www\.)?((?:\S+?/){1,3})\S*(?:\s|$)@i'

This will extract out up to 3 path elements (the host and up to two directories). You can vary the upper bound in {1,3} to define the maximum number of path elements you want.

Changed the ending \S to allow for zero matches.


If you have a predictable URL like SO then it should be easy to grab links with a regex and filter out the ones that match the pattern. So if your URL is http://example.com/stuff/1234 then finding http://example.com/stuff/1234/how-to-mimic would be pretty trivial with a regex.

<?php
preg_match('/http:\/\/example.com\/(\w*)\/(\d)[\/*]/', $text, $matches);

if (is_array($matches))
{
  foreach ($matches as $match)
  {
    // do something...
  }
}
?>


Based somewhat on Kevin Brock's answer, but allows configurable params (folder depth & URL length), and accepts URLs without trailing slashes:

$url = 'http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior';
$output = '';
$params = array (
    'length' => 10,
    'depth' => 2,
);
preg_match ('@http://(?:www\.)?([^/?# ]+)(/\S+)?(?=\s|$)@i', $url, $matches);
if (isset ($matches[2]))
{
    $parts = explode('/', substr($matches[2], 1));
    if (count($parts) > $params['depth'] && strlen($matches[1].$matches[2]) > $params['length'])
        $output = $matches[1].'/'.implode('/', array_slice($parts, 0, 2)).'/...';
    else
        $output = $matches[1].$matches[2];
}
else
    $output = $matches[1];

echo '<a href="'.$matches[0].'">'.$output.'</a>';

Hope this helps


See Regex (regular expression) to match a URL:

https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

PHP Example: Automatically link URL's inside text.

$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜