Rails: What's a good way to validate links (URLs)?
I was wondering how I would best validate URLs in Rails. I was thinking of using开发者_高级运维 a regular expression, but am not sure if this is the best practice.
And, if I were to use a regex, could someone suggest one to me? I am still new to Regex.
Validating an URL is a tricky job. It's also a very broad request.
What do you want to do, exactly? Do you want to validate the format of the URL, the existence, or what? There are several possibilities, depending on what you want to do.
A regular expression can validate the format of the URL. But even a complex regular expression cannot ensure you are dealing with a valid URL.
For instance, if you take a simple regular expression, it will probably reject the following host
http://invalid##host.com
but it will allow
http://invalid-host.foo
that is a valid host, but not a valid domain if you consider the existing TLDs. Indeed, the solution would work if you want to validate the hostname, not the domain because the following one is a valid hostname
http://host.foo
as well the following one
http://localhost
Now, let me give you some solutions.
If you want to validate a domain, then you need to forget about regular expressions. The best solution available at the moment is the Public Suffix List, a list maintained by Mozilla. I created a Ruby library to parse and validate domains against the Public Suffix List, and it's called PublicSuffix.
If you want to validate the format of an URI/URL, then you might want to use regular expressions. Instead of searching for one, use the built-in Ruby URI.parse
method.
require 'uri'
def valid_url?(uri)
uri = URI.parse(uri) && uri.host
rescue URI::InvalidURIError
false
end
You can even decide to make it more restrictive. For instance, if you want the URL to be an HTTP/HTTPS URL, then you can make the validation more accurate.
require 'uri'
def valid_url?(url)
uri = URI.parse(url)
uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
false
end
Of course, there are tons of improvements you can apply to this method, including checking for a path or a scheme.
Last but not least, you can also package this code into a validator:
class HttpUrlValidator < ActiveModel::EachValidator
def self.compliant?(value)
uri = URI.parse(value)
uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
false
end
def validate_each(record, attribute, value)
unless value.present? && self.class.compliant?(value)
record.errors.add(attribute, "is not a valid HTTP URL")
end
end
end
# in the model
validates :example_attribute, http_url: true
I use a one liner inside my models:
validates :url, format: URI::DEFAULT_PARSER.make_regexp(%w[http https])
I think is good enough and simple to use. Moreover it should be theoretically equivalent to the Simone's method, as it use the very same regexp internally.
Following Simone's idea, you can easily create you own validator.
class UrlValidator < ActiveModel::EachValidator
def validate_each(record, attribute, value)
return if value.blank?
begin
uri = URI.parse(value)
resp = uri.kind_of?(URI::HTTP)
rescue URI::InvalidURIError
resp = false
end
unless resp == true
record.errors[attribute] << (options[:message] || "is not an url")
end
end
end
and then use
validates :url, :presence => true, :url => true
in your model.
There is also validate_url gem (which is just a nice wrapper for Addressable::URI.parse
solution).
Just add
gem 'validate_url'
to your Gemfile
, and then in models you can
validates :click_through_url, url: true
This question is already answered, but what the heck, I propose the solution I'm using.
The regexp works fine with all urls I've met. The setter method is to take care if no protocol is mentioned (let's assume http://).
And finally, we make a try to fetch the page. Maybe I should accept redirects and not only HTTP 200 OK.
# app/models/my_model.rb
validates :website, :allow_blank => true, :uri => { :format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix }
def website= url_str
unless url_str.blank?
unless url_str.split(':')[0] == 'http' || url_str.split(':')[0] == 'https'
url_str = "http://" + url_str
end
end
write_attribute :website, url_str
end
and...
# app/validators/uri_vaidator.rb
require 'net/http'
# Thanks Ilya! http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html
class UriValidator < ActiveModel::EachValidator
def validate_each(object, attribute, value)
raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? or options[:format].is_a?(Regexp)
configuration = { :message => I18n.t('errors.events.invalid_url'), :format => URI::regexp(%w(http https)) }
configuration.update(options)
if value =~ configuration[:format]
begin # check header response
case Net::HTTP.get_response(URI.parse(value))
when Net::HTTPSuccess then true
else object.errors.add(attribute, configuration[:message]) and false
end
rescue # Recover on DNS failures..
object.errors.add(attribute, configuration[:message]) and false
end
else
object.errors.add(attribute, configuration[:message]) and false
end
end
end
The solution that worked for me was:
validates_format_of :url, :with => /\A(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?\Z/i
I did try to use some of the example that you attached but I'm supporting url like so:
Notice the use of A and Z because if you use ^ and $ you will see this warning security from Rails validators.
Valid ones:
'www.crowdint.com'
'crowdint.com'
'http://crowdint.com'
'http://www.crowdint.com'
Invalid ones:
'http://www.crowdint. com'
'http://fake'
'http:fake'
You can also try valid_url gem which allows URLs without the scheme, checks domain zone and ip-hostnames.
Add it to your Gemfile:
gem 'valid_url'
And then in model:
class WebSite < ActiveRecord::Base
validates :url, :url => true
end
Just my 2 cents:
before_validation :format_website
validate :website_validator
private
def format_website
self.website = "http://#{self.website}" unless self.website[/^https?/]
end
def website_validator
errors[:website] << I18n.t("activerecord.errors.messages.invalid") unless website_valid?
end
def website_valid?
!!website.match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-=\?]*)*\/?$/)
end
EDIT: changed regex to match parameter urls.
I ran into the same problem lately (I needed to validate urls in a Rails app) but I had to cope with the additional requirement of unicode urls (e.g. http://кц.рф
)...
I researched a couple of solutions and came across the following:
- The first and most suggested thing is using
URI.parse
. Check the answer by Simone Carletti for details. This works ok, but not for unicode urls. - The second method I saw was the one by Ilya Grigorik: http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/ Basically, he tries to make a request to the url; if it works, it is valid...
- The third method I found (and the one I prefer) is an approach similar to
URI.parse
but using theaddressable
gem instead of theURI
stdlib. This approach is detailed here: http://rawsyntax.com/blog/url-validation-in-rails-3-and-ruby-in-general/
Here is an updated version of the validator posted by David James. It has been published by Benjamin Fleischer. Meanwhile, I pushed an updated fork which can be found here.
require 'addressable/uri'
# Source: http://gist.github.com/bf4/5320847
# Accepts options[:message] and options[:allowed_protocols]
# spec/validators/uri_validator_spec.rb
class UriValidator < ActiveModel::EachValidator
def validate_each(record, attribute, value)
uri = parse_uri(value)
if !uri
record.errors[attribute] << generic_failure_message
elsif !allowed_protocols.include?(uri.scheme)
record.errors[attribute] << "must begin with #{allowed_protocols_humanized}"
end
end
private
def generic_failure_message
options[:message] || "is an invalid URL"
end
def allowed_protocols_humanized
allowed_protocols.to_sentence(:two_words_connector => ' or ')
end
def allowed_protocols
@allowed_protocols ||= [(options[:allowed_protocols] || ['http', 'https'])].flatten
end
def parse_uri(value)
uri = Addressable::URI.parse(value)
uri.scheme && uri.host && uri
rescue URI::InvalidURIError, Addressable::URI::InvalidURIError, TypeError
end
end
...
require 'spec_helper'
# Source: http://gist.github.com/bf4/5320847
# spec/validators/uri_validator_spec.rb
describe UriValidator do
subject do
Class.new do
include ActiveModel::Validations
attr_accessor :url
validates :url, uri: true
end.new
end
it "should be valid for a valid http url" do
subject.url = 'http://www.google.com'
subject.valid?
subject.errors.full_messages.should == []
end
['http://google', 'http://.com', 'http://ftp://ftp.google.com', 'http://ssh://google.com'].each do |invalid_url|
it "#{invalid_url.inspect} is a invalid http url" do
subject.url = invalid_url
subject.valid?
subject.errors.full_messages.should == []
end
end
['http:/www.google.com','<>hi'].each do |invalid_url|
it "#{invalid_url.inspect} is an invalid url" do
subject.url = invalid_url
subject.valid?
subject.errors.should have_key(:url)
subject.errors[:url].should include("is an invalid URL")
end
end
['www.google.com','google.com'].each do |invalid_url|
it "#{invalid_url.inspect} is an invalid url" do
subject.url = invalid_url
subject.valid?
subject.errors.should have_key(:url)
subject.errors[:url].should include("is an invalid URL")
end
end
['ftp://ftp.google.com','ssh://google.com'].each do |invalid_url|
it "#{invalid_url.inspect} is an invalid url" do
subject.url = invalid_url
subject.valid?
subject.errors.should have_key(:url)
subject.errors[:url].should include("must begin with http or https")
end
end
end
Please notice that there are still strange HTTP URIs that are parsed as valid addresses.
http://google
http://.com
http://ftp://ftp.google.com
http://ssh://google.com
Here is a issue for the addressable
gem which covers the examples.
I use a slight variation on lafeber solution above.
It disallows consecutive dots in the hostname (such as for instance in www.many...dots.com
):
%r"\A(https?://)?[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,6}(/.*)?\Z"i
URI.parse
seems to mandate scheme prefixing, which in some cases is not what you may want (e.g. if you want to allow your users to quickly spell URLs in forms such as twitter.com/username
)
I have been using the 'activevalidators' gem and it's works pretty well (not just for urls validation)
you can find it here
It's all documented but basically once the gem added you'll want to add the following few lines in an initializer say : /config/environments/initializers/active_validators_activation.rb
# Activate all the validators
ActiveValidators.activate(:all)
(Note : you can replace :all by :url or :whatever if you just want to validate specific types of values)
And then back in your model something like this
class Url < ActiveRecord::Base
validates :url, :presence => true, :url => true
end
Now Restart the server and that should be it
If you want simple validation and a custom error message:
validates :some_field_expecting_url_value,
format: {
with: URI.regexp(%w[http https]),
message: 'is not a valid URL'
}
I liked to monkeypatch the URI module to add the valid? method
inside config/initializers/uri.rb
module URI
def self.valid?(url)
uri = URI.parse(url)
uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
false
end
end
You can validate multiple urls using something like:
validates_format_of [:field1, :field2], with: URI.regexp(['http', 'https']), allow_nil: true
https://github.com/perfectline/validates_url is a nice and simple gem that will do pretty much everything for you
Recently I had this same issue and I found a work around for valid urls.
validates_format_of :url, :with => URI::regexp(%w(http https))
validate :validate_url
def validate_url
unless self.url.blank?
begin
source = URI.parse(self.url)
resp = Net::HTTP.get_response(source)
rescue URI::InvalidURIError
errors.add(:url,'is Invalid')
rescue SocketError
errors.add(:url,'is Invalid')
end
end
The first part of the validate_url method is enough to validate url format. The second part will make sure the url exists by sending a request.
And as a module
module UrlValidator
extend ActiveSupport::Concern
included do
validates :url, presence: true, uniqueness: true
validate :url_format
end
def url_format
begin
errors.add(:url, "Invalid url") unless URI(self.url).is_a?(URI::HTTP)
rescue URI::InvalidURIError
errors.add(:url, "Invalid url")
end
end
end
And then just include UrlValidator
in any model that you want to validate url's for. Just including for options.
URL validation cannot be handled simply by using a Regular Expression as the number of websites keep growing and new domain naming schemes keep coming up.
In my case, I simply write a custom validator that checks for a successful response.
class UrlValidator < ActiveModel::Validator
def validate(record)
begin
url = URI.parse(record.path)
response = Net::HTTP.get(url)
true if response.is_a?(Net::HTTPSuccess)
rescue StandardError => error
record.errors[:path] << 'Web address is invalid'
false
end
end
end
I am validating the path
attribute of my model by using record.path
. I am also pushing the error to the respective attribute name by using record.errors[:path]
.
You can simply replace this with any attribute name.
Then on, I simply call the custom validator in my model.
class Url < ApplicationRecord
# validations
validates_presence_of :path
validates_with UrlValidator
end
You could use regex for this, for me works good this one:
(^|[\s.:;?\-\]<\(])(ftp|https?:\/\/[-\w;\/?:@&=+$\|\_.!~*\|'()\[\]%#,]+[\w\/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])
URI::regexp(%w[http https])
is obsolete and should not be used.
Instead, use URI::DEFAULT_PARSER.make_regexp(%w[http https])
Keep it simple:
validates :url, format: %r{http(s)://.+}
If you want to validate HTTPS you can use:
require "uri"
class HttpsUrlValidator < ActiveModel::EachValidator
def validate_each(record, attribute, value)
unless valid_url?(value)
record.errors[attribute] << "is not a valid URL"
end
end
private
def valid_url?(url)
uri = URI.parse(url)
uri.is_a?(URI::HTTPS) && !uri.host.nil?
rescue URI::InvalidURIError
false
end
end
Usage in the model like this:
validates :website_url, presence: true, https_url: true
精彩评论