开发者

URL equals and checking Internet access

On http://java.sun.com/j2se/1.5.0/docs/api/java/net/URL.html it states that:

Compares this URL for equality with another object.

If the given object is not a URL then this method immediately returns false.

Two URL objects开发者_高级运维 are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can't be resolved, the host names must be equal without regard to case; or both host names equal to null.

Since hosts comparison requires name resolution, this operation is a blocking operation.

Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

According to this, equals will only work if name resolution is possible. Since I can't be sure that a computer has internet access at a given time, should I just use Strings to store addresses instead? Also, how do I go about testing if access is available when requested?


You should use the java.net.URI class. It's equals method works properly.

As for testing connection, I'd say you need to try to open socket connection on given address and port.


I just ran a quick test comparing two URLs with my network unplugged. The URL.equals() method gives the right answer, but without the network connected it took more than two seconds to do the comparison. With the network plugged in it was much faster (~40x). So the performance of the comparison on a computer with no network connection might be an issue, but the comparison itself should work.

(And of course the timing may be affected by specific hardware/OS)


The simple answer is to use [java.net.URI][1] to compare URLs rather than java.net.URL. But, that will still only give you more-or-less literal comparison of URIs.

If you want to do a more intelligent job of comparing URIs, there is a whole section of the URI spec on this topic. Things that may need to be taken into account include:

  • capitalization of the <scheme> and <domain>
  • capitalization of escape sequences
  • default port numbers
  • unnecessary escape sequences
  • non-canonical <path> components
  • significance of the URL <fragment> and <user-info>.

Then there is the issue that different URLs may resolve to the same resource, via a permanent or temporary redirect, or simply by returning identical content. And a single URL may resolve to different resources, depending on HTTP request headers, source IP address, phase of the moon, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜