开发者

How to extract http links from a paragraph and store them in a array on php [duplicate]

This question already has answers here: How to add anchor tag to a URL from text input (8 answers) Closed 9 years ago.

I have a big text inside a var on php, im looking for a good and fast method to retrive all the links inside this text and store them into开发者_高级运维 an array.

The text is plain ascii and the links are the common ones like http://thesite.com or http://www.thesite.com. Thanks for any help.


$text = 'Lorem ipsum http://thesite.com dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt https://www.thesite.com ut labore et dolore magna aliqua. Ut http://www.thesite.com enim ad minim veniam,';

$pattern = '!(https?://[^\s]+)!'; // refine this for better/more specific results

if (preg_match_all($pattern, $text, $matches)) {
    list(, $links) = ($matches);
    print_r($links);
}


Search google for any "URL Regex", then insert it into the following code:

preg_match_all("/your url regex here/",$text,$matches);

all matches are now stored as an array in $matches[0].


Well these regexes here are all nice and so, however, they grow over time and in the end, things might look like a little bit different. It's not all my credit nor is it all ideal, this one is with code from a community project having a some years on it's back and I don't want to say it's ideal, however it suits some needs. Compiled it up into a single function:

echo make_clickable('test http://www.google.com/');

/**
 * make_clickable
 * 
 * make a text clickable
 * 
 * @param string $text to make clickable
 * @param callback $url callback to process URLs
 * @return string clickable text
 * @author hakre and contributors
 * @license GPL
 */
function make_clickable($text, $url = null) {
    if (null === $url)
        $callback_url = function($url) {return $url;};
    else
        $callback_url = $url;
    $ret = ' ' . $text;
    // urls
    $save = ini_set('pcre.recursion_limit', 10000);
    $retval = preg_replace_callback('#(?<!=[\'"])(?<=[*\')+.,;:!&$\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#%~/?@\[\]-]{1,2000}|[\'*(+.,;:!=&$](?![\b\)]|(\))?([\s]|$))|(?(1)\)(?![\s<.,;:]|$)|\)))+)#is', function($matches) use ($callback_url)
    {
        $url = $matches[2];
        $suffix = '';

        /** Include parentheses in the URL only if paired **/
        while ( substr_count( $url, '(' ) < substr_count( $url, ')' ) ) {
            $suffix = strrchr( $url, ')' ) . $suffix;
            $url = substr( $url, 0, strrpos( $url, ')' ) );
        }

        $url = $callback_url($url);
        if ( empty($url) )
            return $matches[0];

        return $matches[1] . "<a href=\"$url\">$url</a>" . $suffix;
    }, $ret);
    if (null !== $retval )
        $ret = $retval;
    ini_set('pcre.recursion_limit', $save);
    // web ftp
    $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]+)#is', function ($matches) use ($callback_url)
    {
        $ret = '';
        $dest = $matches[2];
        $dest = 'http://' . $dest;
        $dest = $callback_url($dest);
        if ( empty($dest) )
            return $matches[0];

        // removed trailing [.,;:)] from URL
        if ( in_array( substr($dest, -1), array('.', ',', ';', ':', ')') ) === true ) {
            $ret = substr($dest, -1);
            $dest = substr($dest, 0, strlen($dest)-1);
        }
        return $matches[1] . "<a href=\"$dest\">$dest</a>$ret";
    }, $ret);
    // email
    $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', function($matches)
    {
        $email = $matches[2] . '@' . $matches[3];
        return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
    }, $ret);
    $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
    $ret = trim($ret);
    return $ret;
}


You have to use regular expressions. preg and ereg are both interesting in PHP, considering that ereg is easier to use, but slower.

Here is a simple preg call that will get URLs from $text.

preg_match_all("/https?:\/\/[^\s]+/i", $text, $urls);

$urls is an array of your URLs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜