开发者

Youtube API - Extract video ID

I am coding a functionality that allows users to enter a Youtube video URL. I would like to extract the video ID from these urls.

Does Youtube API support some kind of function where I pass the link and it gives the video ID in return. Or do I have 开发者_StackOverflowto parse the string myself?

I am using PHP ... I would appreciate any pointers / code samples in this regard.

Thanks


Here is an example function that uses a regular expression to extract the youtube ID from a URL:

/**
 * get youtube video ID from URL
 *
 * @param string $url
 * @return string Youtube video id or FALSE if none found. 
 */
function youtube_id_from_url($url) {
    $pattern = 
        '%^# Match any youtube URL
        (?:https?://)?  # Optional scheme. Either http or https
        (?:www\.)?      # Optional www subdomain
        (?:             # Group host alternatives
          youtu\.be/    # Either youtu.be,
        | youtube\.com  # or youtube.com
          (?:           # Group path alternatives
            /embed/     # Either /embed/
          | /v/         # or /v/
          | /watch\?v=  # or /watch\?v=
          )             # End path alternatives.
        )               # End host alternatives.
        ([\w-]{10,12})  # Allow 10-12 for 11 char youtube id.
        $%x'
        ;
    $result = preg_match($pattern, $url, $matches);
    if ($result) {
        return $matches[1];
    }
    return false;
}

echo youtube_id_from_url('http://youtu.be/NLqAF9hrVbY'); # NLqAF9hrVbY

It's an adoption of the answer from a similar question.


It's not directly the API you're looking for but probably helpful. Youtube has an oembed service:

$url = 'http://youtu.be/NLqAF9hrVbY';
var_dump(json_decode(file_get_contents(sprintf('http://www.youtube.com/oembed?url=%s&format=json', urlencode($url)))));

Which provides some more meta-information about the URL:

object(stdClass)#1 (13) {
  ["provider_url"]=>
  string(23) "http://www.youtube.com/"
  ["title"]=>
  string(63) "Hang Gliding: 3 Flights in 8 Days at Northside Point of the Mtn"
  ["html"]=>
  string(411) "<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/NLqAF9hrVbY?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/NLqAF9hrVbY?version=3" type="application/x-shockwave-flash" width="425" height="344" allowscriptaccess="always" allowfullscreen="true"></embed></object>"
  ["author_name"]=>
  string(11) "widgewunner"
  ["height"]=>
  int(344)
  ["thumbnail_width"]=>
  int(480)
  ["width"]=>
  int(425)
  ["version"]=>
  string(3) "1.0"
  ["author_url"]=>
  string(39) "http://www.youtube.com/user/widgewunner"
  ["provider_name"]=>
  string(7) "YouTube"
  ["thumbnail_url"]=>
  string(48) "http://i3.ytimg.com/vi/NLqAF9hrVbY/hqdefault.jpg"
  ["type"]=>
  string(5) "video"
  ["thumbnail_height"]=>
  int(360)
}

But the ID is not a direct part of the response. However it might contain the information you're looking for and it might be useful to validate the youtube URL.


I am making slight changes in the above regular expression, although it is working fine for youtube short URL (which have been used in the above example) and simple video URL where no other parameter is coming after video code, but it does not work for URLs like http://www.youtube.com/watch?v=B_izAKQ0WqQ&feature=related as video code is not the last parameter in this URL. In the same way v={video_code} does not always come after watch (whereas above regular expression is assuming that it will always come after watch?), like if user has selected language OR location from the footer, for example if user has selected English (UK) from Language option then URL will be http://www.youtube.com/watch?feature=related&hl=en-GB&v=B_izAKQ0WqQ

So I have made some modification in the above regular expressions, but definitely credit goes to hakre for providing the base regular expression, thanks @hakre:

function youtube_id_from_url($url) {
   $pattern =
    '%^# Match any youtube URL
    (?:https?://)?  # Optional scheme. Either http or https
    (?:www\.)?      # Optional www subdomain
    (?:             # Group host alternatives
      youtu\.be/    # Either youtu.be,
    | youtube\.com  # or youtube.com
      (?:           # Group path alternatives
        /embed/     # Either /embed/
      | /v/         # or /v/
      | .*v=        # or /watch\?v=
      )             # End path alternatives.
    )               # End host alternatives.
    ([\w-]{10,12})  # Allow 10-12 for 11 char youtube id.
    ($|&).*         # if additional parameters are also in query string after video id.
    $%x'
    ;
    $result = preg_match($pattern, $url, $matches);
    if (false !== $result) {
      return $matches[1];
    }
    return false;
 }


You can use the PHP function parse_url to extract host name, path, query string and the fragment. You can then use PHP string functions to locate the video id.

function getYouTubeVideoId($url)
{
    $video_id = false;
    $url = parse_url($url);
    if (strcasecmp($url['host'], 'youtu.be') === 0)
    {
        #### (dontcare)://youtu.be/<video id>
        $video_id = substr($url['path'], 1);
    }
    elseif (strcasecmp($url['host'], 'www.youtube.com') === 0)
    {
        if (isset($url['query']))
        {
            parse_str($url['query'], $url['query']);
            if (isset($url['query']['v']))
            {
                #### (dontcare)://www.youtube.com/(dontcare)?v=<video id>
                $video_id = $url['query']['v'];
            }
        }
        if ($video_id == false)
        {
            $url['path'] = explode('/', substr($url['path'], 1));
            if (in_array($url['path'][0], array('e', 'embed', 'v')))
            {
                #### (dontcare)://www.youtube.com/(whitelist)/<video id>
                $video_id = $url['path'][1];
            }
        }
    }
    return $video_id;
}
$urls = array(
    'http://youtu.be/dQw4w9WgXcQ',
    'http://www.youtube.com/?v=dQw4w9WgXcQ',
    'http://www.youtube.com/?v=dQw4w9WgXcQ&feature=player_embedded',
    'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
    'http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=player_embedded',
    'http://www.youtube.com/v/dQw4w9WgXcQ',
    'http://www.youtube.com/e/dQw4w9WgXcQ',
    'http://www.youtube.com/embed/dQw4w9WgXcQ'
);
foreach ($urls as $url)
{
    echo sprintf('%s -> %s' . PHP_EOL, $url, getYouTubeVideoId($url));
}


Simple as return substr(strstr($url, 'v='), 2, 11);


I know this is a very late answer but I found this thread while searching for the topic so I want to suggest a more elegant way of doing this using oEmbed:

echo get_embed('youtube', 'https://www.youtube.com/watch?v=IdxKPCv0bSs');

function get_embed($provider, $url, $max_width = '', $max_height = ''){
    $providers = array(
        'youtube' => 'http://www.youtube.com/oembed'
        /* you can add support for more providers here */
    );

    if(!isset($providers[$provider])){
        return 'Invalid provider!';
    }

    $movie_data_json = @file_get_contents(
        $providers[$provider] . '?url=' . urlencode($url) . 
        "&maxwidth={$max_width}&maxheight={$max_height}&format=json"
    );

    if(!$movie_data_json){
        $error = error_get_last();
        /* remove the PHP stuff from the error and show only the HTTP error message */
        $error_message = preg_replace('/.*: (.*)/', '$1', $error['message']);
        return $error_message;
    }else{
        $movie_data = json_decode($movie_data_json, true);
        return $movie_data['html'];
    }
}

oEmbed makes it possible to embed content from more sites by just adding their oEmbed API endpoint to the $providers array in the above code.


Here is a simple solution that has worked for me.

VideoId is the longest word in any YouTube URL types and it comprises (alphanumeric + "-") with minimum length of 8 surrounded by non-word chars. So you can search for below regex in the URL as a group and that first group is your answer. First group because some youtube parameters such as enablejsapi are more than 8 chars but they always come after videoId.

Regex: "\W([\w-]{9,})(\W|$)"

Here is the working java code:

String[] youtubeUrls = {
    "https://www.youtube.com/watch?v=UzRtrjyDwx0",
    "https://youtu.be/6butf1tEVKs?t=22s",
    "https://youtu.be/R46-XgqXkzE?t=2m52s",
    "http://youtu.be/dQw4w9WgXcQ",
    "http://www.youtube.com/?v=dQw4w9WgXcQ",
    "http://www.youtube.com/?v=dQw4w9WgXcQ&feature=player_embedded",
    "http://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "http://www.youtube.com/watch?v=dQw4w9WgXcQ&feature=player_embedded",
    "http://www.youtube.com/v/dQw4w9WgXcQ",
    "http://www.youtube.com/e/dQw4w9WgXcQ",
    "http://www.youtube.com/embed/dQw4w9WgXcQ"
};

String pattern = "\\W([\\w-]{9,})(\\W|$)";
Pattern pattern2 = Pattern.compile(pattern);

for (int i=0; i<youtubeUrls.length; i++){
    Matcher matcher2 = pattern2.matcher(youtubeUrls[i]);
    if (matcher2.find()){
        System.out.println(matcher2.group(1));
    }
    else System.out.println("Not found");
}


As mentioned in a comment below the valid answer, we use it like this, and it works mighty fine!

function youtube_id_from_url($url) {

$url = trim(strtok("$url", '?'));
$url = str_replace("#!/", "", "$url");

    $pattern = 
        '%^# Match any youtube URL
        (?:https?://)?  # Optional scheme. Either http or https
        (?:www\.)?      # Optional www subdomain
        (?:             # Group host alternatives
          youtu\.be/    # Either youtu.be,
        | youtube\.com  # or youtube.com
          (?:           # Group path alternatives
            /embed/     # Either /embed/
          | /v/         # or /v/
          | /watch\?v=  # or /watch\?v=
          )             # End path alternatives.
        )               # End host alternatives.
        ([\w-]{10,12})  # Allow 10-12 for 11 char youtube id.
        $%x'
        ;
    $result = preg_match($pattern, $url, $matches);
    if ($result) {
        return $matches[1];
    }
    return false;
}


How about this one:

function getVideoId() {
    $query = parse_url($this->url, PHP_URL_QUERY);

    $arr = explode('=', $query);

    $index = array_search('v', $arr);

    if ($index !== false) {
        if (isset($arr[$index++])) {
            $string = $arr[$index++];
            if (($amp = strpos($string, '&')) !== false) {
                return substr($string, 0, $amp);
            } else {
                return $string;
            }
        } else {
            return false;
        }
    }
    return false;
}

No regex, support multiple query parameters, i.e, https://www.youtube.com/watch?v=PEQxWg92Ux4&index=9&list=RDMMom0RGEnWIEk also works.


For JAVA developers

Got this working for me, also supports no-cookie url's:

    private static final Pattern youtubeId = Pattern.compile("^(?:https?\\:\\/\\/)?.*(?:youtu.be\\/|vi?\\/|vi?=|u\\/\\w\\/|embed\\/|(watch)?vi?=)([^#&?]*).*$");


    @VisibleForTesting
    String getVideoId(final String url) {
        final Matcher matcher = youtubeId.matcher(url);
        if(matcher.find()){
            return matcher.group(2);
        }
        return "";
    }

Some test to check youtube url's

    @ParameterizedTest
    @MethodSource("youtubeTestUrls")
    void videoIdFromUrlTest(final String url, final String videoId) {

        final String matchedVidID = this.youtubeService.getVideoId(url);

        assertEquals(videoId, matchedVidID);
    }

    private static Stream<Arguments> youtubeTestUrls() {
        return Stream.of(
                Arguments.of("www.youtube-nocookie.com/embed/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=channel", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=osPknwzXEas&feature=sub", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/user/SilkRoadTheatre#p/a/u/2/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://youtu.be/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtu.be", "dQw4-9W_XcQ"),
                Arguments.of("http://youtu.be/dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("https://www.youtube.com/user/Scobleizer#p/u/1/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&playnext_from=TL&videos=dQw4-9W_XcQ&feature=sub", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/ytscreeningroom?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/embed/dQw4-9W_XcQ?rel=0", "dQw4-9W_XcQ"),
                Arguments.of("https://www.youtube.com/watch?v=dQw4-9W_XcQ", "dQw4-9W_XcQ"),
                Arguments.of("http://youtube.com/v/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://youtube.com/vi/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://youtube.com/?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://www.youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://youtube.com/?vi=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("https://youtube.com/watch?v=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://youtube.com/watch?vi=dQw4-9W_XcQ&feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("http://youtu.be/dQw4-9W_XcQ?feature=youtube_gdata_player", "dQw4-9W_XcQ"),
                Arguments.of("https://www.youtube.com/watch?v=yYw2Q141thM&list=PLOwEeBApnYoUFioRitjwz-DREzFGOSgiE&index=2", "yYw2Q141thM"),
                Arguments.of("https://www.youtube.com/watch?", "")
        );
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜