开发者

How to verify that URL is valid in Java 1.6?

My application processes URLs entered manually by users. I have discovered that some of malformed URLs (like 'http:/not-valid') result in NullPointerException thrown when connection is being opened. As I learned from this Java bug report, the issue is known and will not be fixed. The suggestion is to use java.net.URI, which is "more RFC 2396-conformant".

Question is: how to use URI to work around the problem? The only thing I can do with URI is to use it to parse string and generate URL. I have prepared following program:

import java.net.*;

public class Test
{
    public static void main(String[] args)
    {
       try {
           URI uri = URI.create(args[0]);
           Object o = uri.toURL().getContent(); // try to get content
       }
       catch(Throwable e) {
           e.printStackTrace();
       }
    }
}

Here are results of my tests (with java 1.6.0_20), not much different from what I get with java.net.URL:

sh-3.2$ java Test url-not-valid
java.lang.IllegalArgumentException: URI is not absolute
        at java.net.URI.toURL(URI.java:1080)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:/url-not-valid
java.lang.NullPointerException
        at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:///url-not-valid
java.lang.IllegalArgumentException: protocol = http host = null
        at sun.net.spi.DefaultProxySelector.s开发者_开发知识库elect(DefaultProxySelector.java:151)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:796)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)
sh-3.2$ java Test http:////url-not-valid
java.lang.NullPointerException
        at sun.net.www.ParseUtil.toURI(ParseUtil.java:261)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:795)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
        at java.net.URLConnection.getContent(URLConnection.java:688)
        at java.net.URL.getContent(URL.java:1024)
        at Test.main(Test.java:9)


You can use appache Validator Commons ..

UrlValidator urlValidator = new UrlValidator();
urlValidator.isValid("http://google.com");

http://commons.apache.org/validator/

http://commons.apache.org/validator/api-1.3.1/


If I run your code with the type of malformed URI in the bug report then it throws URISyntaxException. So the suggested fix fixes the reported error.

$ java -cp bin UriTest http:\\\\www.google.com\\
java.lang.IllegalArgumentException
    at java.net.URI.create(URI.java:842)
    at UriTest.main(UriTest.java:8)
Caused by: java.net.URISyntaxException: Illegal character in opaque part at index 5: http:\\www.google.com\
    at java.net.URI$Parser.fail(URI.java:2809)
    at java.net.URI$Parser.checkChars(URI.java:2982)
    at java.net.URI$Parser.parse(URI.java:3019)
    at java.net.URI.(URI.java:578)
    at java.net.URI.create(URI.java:840)

Your type of malformed URI is different, and does not appear to be a syntax error.

Instead, catch the null pointer exception and recover with a suitable message.

You could try and be friendly and check whether the URI starts with a single slash "http:/" and suggest that to the user, or you can check whether the hostname of the URL is non-empty:

import java.net.*;

public class UriTest
{
    public static void main ( String[] args )
    {
        try {
            URI uri = URI.create ( args[0] );

            // avoid null pointer exception
            if ( uri.getHost() == null )
                throw new MalformedURLException ( "no hostname" );

            URL url = uri.toURL();
            URLConnection s = url.openConnection();

            s.getInputStream();
        } catch ( Throwable e ) {
            e.printStackTrace();
        }
    }
}


Note that even with the approaches proposed in the other answers, you wouldn't get validation right, since java.net.URI adheres to RFC 2396, which is notably outdated. By using java.net.URI, you'll get exceptions for URLs that today are valid for all web browsers.

In order to solve these issues, I wrote a library for URL parsing in Java: galimatias. It performs URL parsing the same way web browsers do (adhering to the WHATWG URL Specification).

In your case, you can write:

try {
    URL url = io.mola.galimatias.URL.parse(url).toJavaURL();
} catch (GalimatiasParseException e) {
    // If this exception is thrown, the given URL contains a unrecoverable error. That is, it's completely invalid.
}

As a nice side-effect, you get a lot of sanitization that you won't get with java.net.URI. For example, http:/example.com will be correctly parsed as http://example.com/.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜