Finding multiple urls in a string.

2023-03-24 11:31 问答作者：

 $resource = "THIS IS ABOUT WWW.JONAKCOMPUTERS.COM, HTTP://HIGHLOW.COM, AND TESTINGSERVER1开发者_开发百科.COM"

and I want to pull out the three urls into another string that is similar to:

 $all_urls = "JONAKCOMPUTERS.COM - HIGHLOW.COM - TESTSERVER1.COM

I found this by someone else:

$pattern = '#(www\.|https?:\/\/){1}[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';
preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER);

But it doesn't pull "jonakcomputers.com" only "http://url" or "www.url"

Sorry for the caps, I just wanted to make it clear that its not case sensitive at the end. I can always capitalize it. I need to do this before the page loads, so it could be javascript or php.

If I could pull one out I think I could do a loop to keep checking for new ones till it runs out.

Thanks for anyone willing to help out.

I ran your code in a console, just adjusting the variable name in the last snippet so that:

php > $resource = "THIS IS ABOUT WWW.JONAKCOMPUTERS.COM, HTTP://HIGHLOW.COM, AND TESTINGSERVER1.COM"
php > $pattern = '#(www\.|https?:\/\/){1}[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';
php > preg_match_all($pattern, $resource, $matches, PREG_PATTERN_ORDER);
php > var_dump($matches);
array(3) {
    [0]=>
        array(2) {
            [0]=>
                string(23) "WWW.JONAKCOMPUTERS.COM,"
            [1]=>
                string(19) "HTTP://HIGHLOW.COM,"
        }
    [1]=>
        array(2) {
            [0]=>
                string(4) "WWW."
            [1]=>
                string(7) "HTTP://"
        }
    [2]=>
        array(2) {
            [0]=>
                string(1) ","
            [1]=>
                string(1) ","
        }
}

What you see in the preg_match return is a multidimensional array w/ the following:

0: Full Matches

1: SubPattern 1 matches

2: SubPattern 2 matches

The only fix I see is that you'll need to adjust the RegExp slightly to account for the lack of ww or http. so just use this for pattern:

$pattern = '#(www\.|https?:\/\/)?[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';

and your $matches should now contain all 3.

The current regex you have relies completely on the initial www or http to find urls. If you want to grap those incomplete urls you would first need to define what you are looking for.

For example, are you only looking for things ending in .com or would you also need to get "jonakcomputers.br"?

I'd like to contribute to the discussion, since it helped me come to this solution. Other who might run the same google query as I did, might have the same problem.

I needed a piece of regexp code to go through every url in a text, clean it up and mark it with a css class (for jQuery version of embedly)

This function takes in text, iterates over each and every url it finds (using the RegExp posted by Kai)

function find_urls($text)
{
    $ret = '';
    // The Regular Expression filter
    $pattern = '#(www\.|https?:\/\/)?[a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,}(\S*)#i';

    preg_match_all($pattern, $text, $matches);
    if (sizeof($matches) > 0)
    {
    foreach($matches[0] as $match)
        {
            if(strrpos($match,'http://') === false)
            {
                $url = '<a class="embedly" target="_blank" href="http://'.$match.'">'.$match.'</a> ';

            }
            else
            {
            $url = '<a class="embedly" target="_blank" href="'.$match.'">'.$match.'</a> ';
            }
            $text = str_replace($match,$url,$text);
        }
    }
    return $text;
}

继续阅读：php regex string

Finding multiple urls in a string.

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？