开发者

Finding multiple urls in a string.

 $resource = "THIS IS ABOUT WWW.JONAKCOMPUTERS.COM, HTTP://HIGHLOW.COM, AND TESTINGSERVER1开发者_开发百科.COM"

and I want to pull out the three urls into another string that is similar to:

 $all_urls = "JONAKCOMPUTERS.COM - HIGHLOW.COM - TESTSERVER1.COM

I found this by someone else:

$pattern = '#(www\.|https?:\/\/){1}[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';
preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER);

But it doesn't pull "jonakcomputers.com" only "http://url" or "www.url"

Sorry for the caps, I just wanted to make it clear that its not case sensitive at the end. I can always capitalize it. I need to do this before the page loads, so it could be javascript or php.

If I could pull one out I think I could do a loop to keep checking for new ones till it runs out.

Thanks for anyone willing to help out.


I ran your code in a console, just adjusting the variable name in the last snippet so that:

php > $resource = "THIS IS ABOUT WWW.JONAKCOMPUTERS.COM, HTTP://HIGHLOW.COM, AND TESTINGSERVER1.COM"
php > $pattern = '#(www\.|https?:\/\/){1}[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';
php > preg_match_all($pattern, $resource, $matches, PREG_PATTERN_ORDER);
php > var_dump($matches);
array(3) {
    [0]=>
        array(2) {
            [0]=>
                string(23) "WWW.JONAKCOMPUTERS.COM,"
            [1]=>
                string(19) "HTTP://HIGHLOW.COM,"
        }
    [1]=>
        array(2) {
            [0]=>
                string(4) "WWW."
            [1]=>
                string(7) "HTTP://"
        }
    [2]=>
        array(2) {
            [0]=>
                string(1) ","
            [1]=>
                string(1) ","
        }
}

What you see in the preg_match return is a multidimensional array w/ the following:

0: Full Matches

1: SubPattern 1 matches

2: SubPattern 2 matches

The only fix I see is that you'll need to adjust the RegExp slightly to account for the lack of ww or http. so just use this for pattern:

$pattern = '#(www\.|https?:\/\/)?[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';

and your $matches should now contain all 3.


The current regex you have relies completely on the initial www or http to find urls. If you want to grap those incomplete urls you would first need to define what you are looking for.

For example, are you only looking for things ending in .com or would you also need to get "jonakcomputers.br"?


I'd like to contribute to the discussion, since it helped me come to this solution. Other who might run the same google query as I did, might have the same problem.

I needed a piece of regexp code to go through every url in a text, clean it up and mark it with a css class (for jQuery version of embedly)

This function takes in text, iterates over each and every url it finds (using the RegExp posted by Kai)

function find_urls($text)
{
    $ret = '';
    // The Regular Expression filter
    $pattern = '#(www\.|https?:\/\/)?[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';

    preg_match_all($pattern, $text, $matches);
    if (sizeof($matches) > 0)
    {
    foreach($matches[0] as $match)
        {
            if(strrpos($match,'http://') === false)
            {
                $url = '<a class="embedly" target="_blank" href="http://'.$match.'">'.$match.'</a> ';

            }
            else
            {
            $url = '<a class="embedly" target="_blank" href="'.$match.'">'.$match.'</a> ';
            }
            $text = str_replace($match,$url,$text);
        }
    }
    return $text;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜