开发者

Regex to match Youtube URL's

I am trying to validate a Youtube URL using regex:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+~', $videoLink)

It kind of works, but it can match URL's that are malformed. For example, this will match ok:

http://www.youtube.com/watch?v=Zu4WXiPRek

But so will this:

http://www.youtube.com/watch?v=Zu4WX£&P!ek

And this wont:

http://www.youtube.com/watch?v=!Zu4WX£开发者_Go百科&P4ek

I think it's because of the + operator. It's matching what seems to be the first character after v=, when it needs to try and match everything behind v= with [a-zA-Z0-9-]. Any help is appreciated, thanks.


To provide an alternative that is larger and much less elegant than a regex, but works with PHP's native URL parsing functions so it might be a bit more reliable in the long run:

 $url = "http://www.youtube.com/watch?v=Zu4WXiPRek";

 $query_string = parse_url($url, PHP_URL_QUERY); // v=Zu4WXiPRek

 $query_string_parsed = array();                        
 parse_str($query_string, $query_string_parsed); // an array with all GET params

 echo($query_string_parsed["v"]); // Will output Zu4WXiPRek that you can then
                                  // validate for [a-zA-Z0-9] using a regex


The problem is that you are not requiring any particular number of characters in the v= part of the URL. So, for instance, checking

http://www.youtube.com/watch?v=Zu4WX£&P!ek

will match

http://www.youtube.com/watch?v=Zu4WX

and therefore return true. You need to either specify the number of characters you need in the v= part:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]{10}~', $videoLink)

or specify that the group [a-zA-Z0-9-] must be the last part of the string:

preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+$~', $videoLink)

Your other example

http://www.youtube.com/watch?v=!Zu4WX£&P4ek

does not match, because the + sign requires that at least one character must match [a-zA-Z0-9-].


Short answer:

preg_match('%(http://www.youtube.com/watch\?v=(?:[a-zA-Z0-9-])+)(?:[&"\'\s])%', $videoLink)

There are a few assumptions made here, so let me explain:

  • I added a capturing group ( ... ) around the entire http://www.youtube.com/watch?v=blah part of the link, so that we can say "I want get the whole validated link up to and including the ?v=movieHash"
  • I added the non-capturing group (?: ... ) around your character set [a-zA-Z0-9-] and left the + sign outside of that. This will allow us to match all allowable characters up to a certain point.
  • Most importantly, you need to tell it how you expect your link to terminate. I'm taking a guess for you with (?:[&"\'\s])

    ?) Will it be in html format (e.g. anchor tag) ? If so, the link in href will obviously end with a " or '.
    ?) Or maybe there's more to the query string, so there would be an & after the value of v.
    ?) Maybe there's a space or line break after the end of the link \s.

The important piece is that you can get much more accurate results if you know what's surrounding what you are searching for, as is the case with many regular expressions.

This non-capturing group (in which I'm making assumptions for you) will take a stab at finding and ignoring all the extra junk after what you care about (the ?v=awesomeMovieHash).

Results:

http://www.youtube.com/watch?v=Zu4WXiPRek
 - Group 1 contains the http://www.youtube.com/watch?v=Zu4WXiPRek

http://www.youtube.com/watch?v=Zu4WX&a=b
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=!Zu4WX£&P4ek
 - No match

a href="http://www.youtube.com/watch?v=Zu4WX&size=large"
 - Group 1 contains http://www.youtube.com/watch?v=Zu4WX

http://www.youtube.com/watch?v=Zu4WX£&P!ek
 - No match


The "v=..." blob is not guaranteed to be the first parameter in the query part of the URL. I'd recommend using PHP's parse_url() function to break the URL into its component parts. You can also reassemble a pristine URL if someone began the string with "https://" or simply used "youtube.com" instead of "www.youtube.com", etc.

function get_youtube_vidid ($url) {
    $vidid = false;
    $valid_schemes = array ('http', 'https');
    $valid_hosts = array ('www.youtube.com', 'youtube.com');
    $valid_paths = array ('/watch');

    $bits = parse_url ($url);
    if (! is_array ($bits)) {
        return false;
    }
    if (! (array_key_exists ('scheme', $bits)
            and array_key_exists ('host', $bits)
            and array_key_exists ('path', $bits)
            and array_key_exists ('query', $bits))) {
        return false;
    }
    if (! in_array ($bits['scheme'], $valid_schemes)) {
        return false;
    }
    if (! in_array ($bits['host'], $valid_hosts)) {
        return false;
    }
    if (! in_array ($bits['path'], $valid_paths)) {
        return false;
    }
    $querypairs = explode ('&', $bits['query']);
    if (count ($querypairs) < 1) {
        return false;
    }
    foreach ($querypairs as $querypair) {
        list ($key, $value) = explode ('=', $querypair);
        if ($key == 'v') {
            if (preg_match ('/^[a-zA-Z0-9\-_]+$/', $value)) {
                # Set the return value
                $vidid = $value;
            }
        }
    }

    return $vidid;
}


Following regex will match any youtube link:

$pattern='@(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))@si';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜