How can I extract video ID from YouTube's link in Python?
I know this can be easily done using PHP's parse_url
and parse_str
functions:
$subject = "http://www.youtube.com/watch?v=z_A开发者_高级运维bfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);
But how to achieve this using Python? I can do urlparse
but what next?
I've created youtube id parser without regexp:
import urlparse
def video_id(value):
"""
Examples:
- http://youtu.be/SA2iWivDJiE
- http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
- http://www.youtube.com/embed/SA2iWivDJiE
- http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
"""
query = urlparse.urlparse(value)
if query.hostname == 'youtu.be':
return query.path[1:]
if query.hostname in ('www.youtube.com', 'youtube.com'):
if query.path == '/watch':
p = urlparse.parse_qs(query.query)
return p['v'][0]
if query.path[:7] == '/embed/':
return query.path.split('/')[2]
if query.path[:3] == '/v/':
return query.path.split('/')[2]
# fail?
return None
Python has a library for parsing URLs.
import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]
This is the Python3 version of Mikhail Kashkin's solution with added scenarios.
from urllib.parse import urlparse, parse_qs
from contextlib import suppress
# noinspection PyTypeChecker
def get_yt_id(url, ignore_playlist=False):
# Examples:
# - http://youtu.be/SA2iWivDJiE
# - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
# - http://www.youtube.com/embed/SA2iWivDJiE
# - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
query = urlparse(url)
if query.hostname == 'youtu.be': return query.path[1:]
if query.hostname in {'www.youtube.com', 'youtube.com', 'music.youtube.com'}:
if not ignore_playlist:
# use case: get playlist id not current video in playlist
with suppress(KeyError):
return parse_qs(query.query)['list'][0]
if query.path == '/watch': return parse_qs(query.query)['v'][0]
if query.path[:7] == '/watch/': return query.path.split('/')[1]
if query.path[:7] == '/embed/': return query.path.split('/')[2]
if query.path[:3] == '/v/': return query.path.split('/')[2]
# returns None for invalid YouTube url
Here is RegExp it cover these cases
((?<=(v|V)/)|(?<=be/)|(?<=(\?|\&)v=)|(?<=embed/))([\w-]+)
match = re.search(r"youtube\.com/.*v=([^&]*)", "http://www.youtube.com/watch?v=z_AbfPXTKms&test=123")
if match:
result = match.group(1)
else:
result = ""
Untested.
You can use
from urllib.parse import urlparse
url_data = urlparse("https://www.youtube.com/watch?v=RG9TMn1FJzc")
print(url_data.query[2::])
I use this great package pytube
.$ pip install pytube
#Examples
url1='http://youtu.be/SA2iWivDJiE'
url2='http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu'
url3='http://www.youtube.com/embed/SA2iWivDJiE'
url4='http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US'
url5='https://www.youtube.com/watch?v=rTHlyTphWP0&index=6&list=PLjeDyYvG6-40qawYNR4juzvSOg-ezZ2a6'
url6='youtube.com/watch?v=_lOT2p_FCvA'
url7='youtu.be/watch?v=_lOT2p_FCvA'
url8='https://www.youtube.com/watch?time_continue=9&v=n0g-Y0oo5Qs&feature=emb_logo'
urls=[url1,url2,url3,url4,url5,url6,url7,url8]
#Get youtube id
from pytube import extract
for url in urls:
id=extract.video_id(url)
print(id)
Output
SA2iWivDJiE
_oPAwA_Udwc
SA2iWivDJiE
SA2iWivDJiE
rTHlyTphWP0
_lOT2p_FCvA
_lOT2p_FCvA
n0g-Y0oo5Qs
Here is something you could try using regex for the youtube video ID:
# regex for the YouTube ID: "^[^v]+v=(.{11}).*"
result = re.match('^[^v]+v=(.{11}).*', url)
print result.group(1)
No need for regex. Split on ?
, take the second, split on =
, take the second, split on &
, take the first.
Splitting strings is a really bad idea when those parameters could come in any order. Stick with urlparse:
from urllib.parse import parse_qs, urlparse
vid = parse_qs(urlparse(url).query).get('v')
Although this will take a search query but gives you the id
:
from youtube_search import YoutubeSearch
results = YoutubeSearch('search terms', max_results=10).to_json()
print(results)
url = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1"
parsed = url.split("?")
videoId = parsed[1]
print(videoId)
This will work for all kinds of YouTube video links.
I am very late, but I use this snippet to get the video id.
def video_id(url: str) -> str:
"""Extract the ``video_id`` from a YouTube url.
This function supports the following patterns:
- :samp:`https://youtube.com/watch?v={video_id}`
- :samp:`https://youtube.com/embed/{video_id}`
- :samp:`https://youtu.be/{video_id}`
:param str url:
A YouTube url containing a video id.
:rtype: str
:returns:
YouTube video id.
"""
return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
def regex_search(pattern: str, string: str, group: int):
"""Shortcut method to search a string for a given pattern.
:param str pattern:
A regular expression pattern.
:param str string:
A target string to search.
:param int group:
Index of group to return.
:rtype:
str or tuple
:returns:
Substring pattern matches.
"""
regex = re.compile(pattern)
results = regex.search(string)
if not results:
return False
return results.group(group)
I use this
def getId(videourl):
vidid=videourl.find('watch?v=')
Id = videourl[vidid+8:vidid+19]
if vidid==-1:
vidid=videourl.find('be/')
Id=videourl[vidid+3:]
return Id
精彩评论