How to check for a valid URL in Java?
What is the best way to check if a URL is valid in Java?
If tried to call new URL(urlString)
and catch a MalformedURLException
开发者_如何转开发, but it seems to be happy with anything that begins with http://
.
I'm not concerned about establishing a connection, just validity. Is there a method for this? An annotation in Hibernate Validator? Should I use a regex?
Edit: Some examples of accepted URLs are http://***
and http://my favorite site!
.
Consider using the Apache Commons UrlValidator class
UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://my favorite site!");
There are several properties that you can set to control how this class behaves, by default http
, https
, and ftp
are accepted.
Here is way I tried and found useful,
URL u = new URL(name); // this would check for the protocol
u.toURI(); // does the extra checking required for validation of URI
I'd love to post this as a comment to Tendayi Mawushe's answer, but I'm afraid there is not enough space ;)
This is the relevant part from the Apache Commons UrlValidator source:
/**
* This expression derived/taken from the BNF for URI (RFC2396).
*/
private static final String URL_PATTERN =
"/^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?/";
// 12 3 4 5 6 7 8 9
/**
* Schema/Protocol (ie. http:, ftp:, file:, etc).
*/
private static final int PARSE_URL_SCHEME = 2;
/**
* Includes hostname/ip and port number.
*/
private static final int PARSE_URL_AUTHORITY = 4;
private static final int PARSE_URL_PATH = 5;
private static final int PARSE_URL_QUERY = 7;
private static final int PARSE_URL_FRAGMENT = 9;
You can easily build your own validator from there.
The most "foolproof" way is to check for the availability of URL:
public boolean isURL(String url) {
try {
(new java.net.URL(url)).openStream().close();
return true;
} catch (Exception ex) { }
return false;
}
My favorite approach, without external libraries:
try {
URI uri = new URI(name);
// perform checks for scheme, authority, host, etc., based on your requirements
if ("mailto".equals(uri.getScheme()) {/*Code*/}
if (uri.getHost() == null) {/*Code*/}
} catch (URISyntaxException e) {
}
I didn't like any of the implementations (because they use a Regex which is an expensive operation, or a library which is an overkill if you only need one method), so I ended up using the java.net.URI class with some extra checks, and limiting the protocols to: http, https, file, ftp, mailto, news, urn.
And yes, catching exceptions can be an expensive operation, but probably not as bad as Regular Expressions:
final static Set<String> protocols, protocolsWithHost;
static {
protocolsWithHost = new HashSet<String>(
Arrays.asList( new String[]{ "file", "ftp", "http", "https" } )
);
protocols = new HashSet<String>(
Arrays.asList( new String[]{ "mailto", "news", "urn" } )
);
protocols.addAll(protocolsWithHost);
}
public static boolean isURI(String str) {
int colon = str.indexOf(':');
if (colon < 3) return false;
String proto = str.substring(0, colon).toLowerCase();
if (!protocols.contains(proto)) return false;
try {
URI uri = new URI(str);
if (protocolsWithHost.contains(proto)) {
if (uri.getHost() == null) return false;
String path = uri.getPath();
if (path != null) {
for (int i=path.length()-1; i >= 0; i--) {
if ("?<>:*|\"".indexOf( path.charAt(i) ) > -1)
return false;
}
}
}
return true;
} catch ( Exception ex ) {}
return false;
}
Judging by the source code for URI
, the
public URL(URL context, String spec, URLStreamHandler handler)
constructor does more validation than the other constructors. You might try that one, but YMMV.
validator package:
There seems to be a nice package by Yonatan Matalon called UrlUtil. Quoting its API:
isValidWebPageAddress(java.lang.String address, boolean validateSyntax,
boolean validateExistance)
Checks if the given address is a valid web page address.
Sun's approach - check the network address
Sun's Java site offers connect attempt as a solution for validating URLs.
Other regex code snippets:
There are regex validation attempts at Oracle's site and weberdev.com.
精彩评论