Getting domain of an URL with Regular Expressions
I'm trying to get the domain of a given URL. For example http://www.facebook.com/someuser/
will return facebook.com
. The given URL can be on 开发者_StackOverflow社区these formats:
https://www.facebook.com/someuser
(www. is optional, but should be ignored)www.facebook.com/someuser
(http:// is not required)facebook.com/someuser
http://someuser.tumblr.com
-> this has to returntumblr.com
only
I wrote this regex:
/(?: \.|\/{2})(?: www\.)?([^\/]*)/i
But it does not work as I expect.
I can do this in parts:
- Remove
http://
andhttps://
, if present on string, withstring.delete "/https?:\/\//i"
. - Remove
www.
withstring.delete "/www\./i"
. - Get the domain with match and
/(\w+\.\w+)+/i
But this won't work with subdomains. String for testing:
https://www.facebook.com/username
http://last.fm/user/username
www.google.com
facebook.com/username
http://sub.tumblr.com/
sub.tumblr.com
I need this to work with the minimum memory and processing coast as possible.
Any ideas?
Why don't you just use the URI class to do this?
URI.parse( your_uri ).host
And you're done.
Just one thing, if there's no "http://" or "https://" at the beginning of the url, you'll have to add one, or the parse method is not going to give you a host (it's going to be nil).
This works for me: /^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[A-Z]{2,})+[A-Z\/]/i
It will always give you the domain part only
Take a look at it at:
http://rubular.com/r/0hudnJSgVT
To use it create a method like this, I put it in my helpers so I have access to in in the views.
def website_url(website_url)
if website_url[/^h?t?t?p?s?:?\/?\/?w?w?w?\.?(.*\.[A-Z\/]{2,})$/i]
website_id = $1
end
%Q{http://#{ website_id }}
end
Does it have to be a regex? You could do this also.
require 'uri'
yourURL = URI.parse('https://www.facebook.com/username')
print yourURL.host
You could use this regex:
/(\w+\.\w{2,6})(?:\/|$)/
If you really wanted to use a regex, you could try something along the lines of:
test_string.scan(/\w+\.\w+(?=\/|\s|$)/) { |match| do_stuff_with(match) }
This wouldn't account for domain names such as something.co.uk but it would match everything in your test string.
I have created a function for String class through Open Classes technique for my purpose.
class String
def to_dn
return '' if self.blank?
return self.split('@').last if self.match('@')
link = self
link = "http://#{link}" unless link.match(/^(http:\/\/|https:\/\/)/)
link = URI.parse(URI.encode(link)).host.present? ? URI.parse(URI.encode(link)).host : link.strip
domain_name = link.sub(/.*?www./,'')
domain_name = domain_name.match(/[A-Z]+.[A-Z]{2,4}$/i).to_s if domain_name.split('.').length >= 2 && domain_name.match(/[A-Z]+.[A-Z]{2,4}$/i).present?
end
end
Example:
1. "https://www.facebook.com/someuser".to_dn = "facebook.com"
2. "www.facebook.com/someuser".to_dn = "facebook.com"
3. "facebook.com/someuser".to_dn = "facebook.com"
4. "http://someuser.tumblr.com".to_dn = "tumblr.com"
5. "dc.ads.linkedin.com".to_dn = "linkedin.com"
6. 'your_name@domain.com'.to_dn = "domain.com"
It also work for email addresses (which require for my purpose). Hope it will useful of others. Correct me if you find anything incorrect :)
Note: It will not works for 'www.domainname.co.in'. I am working on it :)
精彩评论