开发者

How to check for a valid URL in Java?

What is the best way to check if a URL is valid in Java?

If tried to call new URL(urlString) and catch a MalformedURLException开发者_如何转开发, but it seems to be happy with anything that begins with http://.

I'm not concerned about establishing a connection, just validity. Is there a method for this? An annotation in Hibernate Validator? Should I use a regex?

Edit: Some examples of accepted URLs are http://*** and http://my favorite site!.


Consider using the Apache Commons UrlValidator class

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://my favorite site!");

There are several properties that you can set to control how this class behaves, by default http, https, and ftp are accepted.


Here is way I tried and found useful,

URL u = new URL(name); // this would check for the protocol
u.toURI(); // does the extra checking required for validation of URI 


I'd love to post this as a comment to Tendayi Mawushe's answer, but I'm afraid there is not enough space ;)

This is the relevant part from the Apache Commons UrlValidator source:

/**
 * This expression derived/taken from the BNF for URI (RFC2396).
 */
private static final String URL_PATTERN =
        "/^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?/";
//         12            3  4          5       6   7        8 9

/**
 * Schema/Protocol (ie. http:, ftp:, file:, etc).
 */
private static final int PARSE_URL_SCHEME = 2;

/**
 * Includes hostname/ip and port number.
 */
private static final int PARSE_URL_AUTHORITY = 4;

private static final int PARSE_URL_PATH = 5;

private static final int PARSE_URL_QUERY = 7;

private static final int PARSE_URL_FRAGMENT = 9;

You can easily build your own validator from there.


The most "foolproof" way is to check for the availability of URL:

public boolean isURL(String url) {
  try {
     (new java.net.URL(url)).openStream().close();
     return true;
  } catch (Exception ex) { }
  return false;
}


My favorite approach, without external libraries:

try {
    URI uri = new URI(name);

    // perform checks for scheme, authority, host, etc., based on your requirements

    if ("mailto".equals(uri.getScheme()) {/*Code*/}
    if (uri.getHost() == null) {/*Code*/}

} catch (URISyntaxException e) {
}


I didn't like any of the implementations (because they use a Regex which is an expensive operation, or a library which is an overkill if you only need one method), so I ended up using the java.net.URI class with some extra checks, and limiting the protocols to: http, https, file, ftp, mailto, news, urn.

And yes, catching exceptions can be an expensive operation, but probably not as bad as Regular Expressions:

final static Set<String> protocols, protocolsWithHost;

static {
  protocolsWithHost = new HashSet<String>( 
      Arrays.asList( new String[]{ "file", "ftp", "http", "https" } ) 
  );
  protocols = new HashSet<String>( 
      Arrays.asList( new String[]{ "mailto", "news", "urn" } ) 
  );
  protocols.addAll(protocolsWithHost);
}

public static boolean isURI(String str) {
  int colon = str.indexOf(':');
  if (colon < 3)                      return false;

  String proto = str.substring(0, colon).toLowerCase();
  if (!protocols.contains(proto))     return false;

  try {
    URI uri = new URI(str);
    if (protocolsWithHost.contains(proto)) {
      if (uri.getHost() == null)      return false;

      String path = uri.getPath();
      if (path != null) {
        for (int i=path.length()-1; i >= 0; i--) {
          if ("?<>:*|\"".indexOf( path.charAt(i) ) > -1)
            return false;
        }
      }
    }

    return true;
  } catch ( Exception ex ) {}

  return false;
}


Judging by the source code for URI, the

public URL(URL context, String spec, URLStreamHandler handler)

constructor does more validation than the other constructors. You might try that one, but YMMV.


validator package:

There seems to be a nice package by Yonatan Matalon called UrlUtil. Quoting its API:

isValidWebPageAddress(java.lang.String address, boolean validateSyntax, 
                      boolean validateExistance) 
Checks if the given address is a valid web page address.

Sun's approach - check the network address

Sun's Java site offers connect attempt as a solution for validating URLs.

Other regex code snippets:

There are regex validation attempts at Oracle's site and weberdev.com.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜