URL valid characters. java to validate
a string like: 'www.test.com' is good. a string l开发者_JAVA百科ike: 'www.888.com' is good. a string like: 'stackoverflow.com' is good. a string like: 'GOoGle.Com' is good.
why ? because those are valid urls. it does not necessarely matter if they have been registered or not.
now bad strings are:
'goog*d\x' 'manydots...com'
why because you can't register those urls.
if I have a string in java which is supposed to be a good url what's the best way to validate it ?
thanks a lot
use UrlValidator from the Apache Commons library. Binary package: http://www.mirrorservice.org/sites/ftp.apache.org/commons/validator/binaries/commons-validator-1.3.1.zip (zip contains .jar files)
Example of usage (Construct a UrlValidator with valid schemes of "http", and "https"):
String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints "url is invalid"
If instead the default constructor is used.
UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
System.out.println("url is valid");
} else {
System.out.println("url is invalid");
}
prints out "url is valid"
Those examples are hostnames. They're not valid URLs in themselves.
Hostnames are made of .
-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.
You can match this with a pattern like (assuming case-insensitive):
([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?
However this matches strings like 1.2.3.4
as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.
And then of course there's IDNA. Nowadays, 例え.テスト
is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah
. If you want to allow those you'll need some Unicode support.
Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.
I think that new URL(yourString)
will do the trick: it is supposed to raise MalformedURLException
if url is not compliant (actually on java api it says If the string specifies an unknown protocol, but you can try it anyway):
try
{
new URL(string);
} catch (MalformedURLException e) {
//do whatever
}
I also believe you can use the URL in java.net
URL url = new URL("www.google.com");
The api says
public URL(String spec) throws MalformedURLException
Parameters:
spec - the String to parse as a URL.
Throws:
MalformedURLException - If the string specifies an unknown protocol.
So an exception is thrown if the URL is invalid.
You can do this kind of "url validation" through Regular Expressions.
And here is where you can get some good URL regex's (so you don't have to write your own).
精彩评论