How to verify that URL is valid in Java 1.6?
My application processes URLs entered manually by users. I have discovered that some of malformed URLs (like 'http:/not-valid') result in NullPointerException thrown when connection is being opened. As I learned from this Java bug report, the issue is known and will not be fixed. The suggestion is to use java.net.URI, which is "more RFC 2396-conformant".
Question is: how to use URI to work around the problem? The only thing I can do with URI is to use it to parse string and generate URL. I have prepared following program:
import java.net.*;
public class Test
{
public static void main(String[] args)
{
try {
URI uri = URI.create(args[0]);
Object o = uri.toURL().getContent(); // try to get content
}
catch(Throwable e) {
e.printStackTrace();
}
}
}
Here are results of my tests (with java 1.6.0_20), not much different from what I get with java.net.URL:
sh-3.2$ java Test url-not-valid java.lang.IllegalArgumentException: URI is not absolute at java.net.URI.toURL(URI.java:1080) at Test.main(Test.java:9) sh-3.2$ java Test http:/url-not-valid java.lang.NullPointerException at sun.net.www.ParseUtil.toURI(ParseUtil.java:261) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) at java.net.URLConnection.getContent(URLConnection.java:688) at java.net.URL.getContent(URL.java:1024) at Test.main(Test.java:9) sh-3.2$ java Test http:///url-not-valid java.lang.IllegalArgumentException: protocol = http host = null at sun.net.spi.DefaultProxySelector.s开发者_开发知识库elect(DefaultProxySelector.java:151) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:796) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) at java.net.URLConnection.getContent(URLConnection.java:688) at java.net.URL.getContent(URL.java:1024) at Test.main(Test.java:9) sh-3.2$ java Test http:////url-not-valid java.lang.NullPointerException at sun.net.www.ParseUtil.toURI(ParseUtil.java:261) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) at java.net.URLConnection.getContent(URLConnection.java:688) at java.net.URL.getContent(URL.java:1024) at Test.main(Test.java:9)
You can use appache Validator Commons ..
UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://google.com");
http://commons.apache.org/validator/
http://commons.apache.org/validator/api-1.3.1/
If I run your code with the type of malformed URI in the bug report then it throws URISyntaxException. So the suggested fix fixes the reported error.
$ java -cp bin UriTest http:\\\\www.google.com\\ java.lang.IllegalArgumentException at java.net.URI.create(URI.java:842) at UriTest.main(UriTest.java:8) Caused by: java.net.URISyntaxException: Illegal character in opaque part at index 5: http:\\www.google.com\ at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parse(URI.java:3019) at java.net.URI.(URI.java:578) at java.net.URI.create(URI.java:840)
Your type of malformed URI is different, and does not appear to be a syntax error.
Instead, catch the null pointer exception and recover with a suitable message.
You could try and be friendly and check whether the URI starts with a single slash "http:/" and suggest that to the user, or you can check whether the hostname of the URL is non-empty:
import java.net.*;
public class UriTest
{
public static void main ( String[] args )
{
try {
URI uri = URI.create ( args[0] );
// avoid null pointer exception
if ( uri.getHost() == null )
throw new MalformedURLException ( "no hostname" );
URL url = uri.toURL();
URLConnection s = url.openConnection();
s.getInputStream();
} catch ( Throwable e ) {
e.printStackTrace();
}
}
}
Note that even with the approaches proposed in the other answers, you wouldn't get validation right, since java.net.URI
adheres to RFC 2396, which is notably outdated. By using java.net.URI
, you'll get exceptions for URLs that today are valid for all web browsers.
In order to solve these issues, I wrote a library for URL parsing in Java: galimatias. It performs URL parsing the same way web browsers do (adhering to the WHATWG URL Specification).
In your case, you can write:
try {
URL url = io.mola.galimatias.URL.parse(url).toJavaURL();
} catch (GalimatiasParseException e) {
// If this exception is thrown, the given URL contains a unrecoverable error. That is, it's completely invalid.
}
As a nice side-effect, you get a lot of sanitization that you won't get with java.net.URI
. For example, http:/example.com
will be correctly parsed as http://example.com/
.
精彩评论