开发者

What is the regex to validate that a string is a url to a youtube or vimeo video? [duplicate]

This question already has answers here: Closed 11 years ago. 开发者_开发问答

Possible Duplicates:

How can I alter this regex to get the Youtube video id from a Youtube URL that doesn't specify the v parameter?

What regex can I use to get the domain name from a url in Ruby?

Improving regex for parsing YouTube / Vimeo URLs

What is the regex to validate that a string is a url to a youtube or vimeo video? I'm not so good with regular expressions. This is for a rails application.


For youtube:

yt_regexp = /^http:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/

You get the id of the video also:

>> yt_regexp.match("http://www.youtube.com/watch?v=foo")[1]
=> "foo"

For vimeo:

vimeo_regexp = /^http:\/\/www\.vimeo\.com\/(\d+)/

You can also extract the id using the same as before.

If you want to make "http://www." optional, you can use:

yt_regexp = /^(?:http:\/\/)?(?:www\.)?youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/
vimeo_regexp = /^(?:http:\/\/)?(?:www\.)?vimeo\.com\/(\d+)/


A regex is one way to get there, but not what I'd use. I prefer using a URL parser, like the built-in URI or the Addressable::URI gem. URLs can get messy, and, there are multiple ways a site can be designated in a URL that resolve and will connect to a particular host, but fail the usual "check for the host name" test.

require 'uri'
url = 'http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu'

uri = URI.parse(url)
uri.host # => "www.youtube.com"

A couple ways of doing it:

uri.host['youtube.com']         # => "youtube.com"
uri.host =~ /youtube\.com/      # => 4
!!uri.host['youtube.com']       # => true
!!(uri.host =~ /youtube\.com/)  # => true

Usually our needs are more sophisticated, and we want to know what parameters are embedded in the URL, or what the path to the resource is. Split breaks the URL into its component pieces:

URI.split(url) # => ["http", nil, "www.youtube.com", nil, nil, "/watch", nil, "v=_NaiiBkqOxE&feature=feedu", nil]

Each of the pieces has a defined name, so it's common to break the URL down into elements in a hash. You can create a hash of all the parts for fast lookup:

parts = Hash[*[:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment].zip(URI.split(url)).flatten]
parts # => {:scheme=>"http", :userinfo=>nil, :host=>"www.youtube.com", :port=>nil, :registry=>nil, :path=>"/watch", :opaque=>nil, :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}

Using Addressable::URI to do the same things:

require 'addressable/uri'
uri = Addressable::URI.parse('http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu')
uri.host # => "www.youtube.com"

parts = uri.to_hash
parts # => {:scheme=>"http", :user=>nil, :password=>nil, :host=>"www.youtube.com", :port=>nil, :path=>"/watch", :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}

Wikipedia's page on URL normalization shows a lot of examples of how URLs can vary, yet still point to the same resource. So, if your use is to only match the main domain for a site, then yes, you can use a simple regex, or even a substring search. When you get beyond that need you need to get more sophisticated in how you take the URL apart.


I'm not familiar with vimeo but youtube would be:

"http://www.youtube.com/watch?v=".+

Note the quote marks. You want exactly the format in between them, which is what they tell your regex engine. Otherwise you will get suprised by things like the periods and question mark in the entry. Then you get a random string which finishes off the url.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜