开发者

java URI changes in JDK 1.4 vs JDK 1.5

import java.net.*;

public class TestURI {
     public static void main(String args[]) throws URISyntaxException
     {
        String first = new String("foo");
        String second = new String("bar");
        String third = new String("[space or another space]");

        URI temp = new URI(first, second, third);
        System.out.println(temp.getFragment());

     }
}

When I run the above code in JDK 1.4, I get

[space or another space]

When I run the same code in JDK 1.5/1.6, I get the following:

[space%20or%20another%20space]

Could somebody tell me what changed?

Thanks, Raj

Edit:

If I do something like the following, it works:

import java.net.*;

public class TestURI {
   public static void main(String args[]) throws URISyntaxException
   {
      String first = new String("foo");
      String开发者_如何学运维 second = new String("bar");
      String third = new String("[space or another space]").replaceAll("\\[", "leftSB").replaceAll("\\]", "rightSB");

      URI temp = new URI(first, second, third);
      System.out.println(temp.getFragment().replaceAll("leftSB", "\\[").replaceAll("rightSB", "\\]"));

   }
}


It looks like the spaces got URI-encoded.

%20 is the hexadecimal formatting of the ASCII space character.

I suppose spaces are illegal in the fragment identifier, which the implementation in Java 1.4 did not know.

From the class documentation, emphasis by me:

RFC 2396 allows escaped octets to appear in the user-info, path, query, and fragment components. Escaping serves two purposes in URIs:

  • To encode non-US-ASCII characters when a URI is required to conform strictly to RFC 2396 by not containing any other characters.

  • To quote characters that are otherwise illegal in a component. The user-info, path, query, and fragment components differ slightly in terms of which characters are considered legal and illegal.

These purposes are served in this class by three related operations:

  • A character is encoded by replacing it with the sequence of escaped octets that represent that character in the UTF-8 character set. [...]
  • An illegal character is quoted simply by encoding it. The space character, for example, is quoted by replacing it with "%20". [...]
  • A sequence of escaped octets is decoded by replacing it with the sequence of characters that it represents in the UTF-8 character set. [...]

These operations are exposed in the constructors and methods of this class as follows:

  • The single-argument constructor [...]

  • The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

  • ...
  • The getUserInfo, getPath, getQuery, getFragment, getAuthority, and getSchemeSpecificPart methods decode any escaped octets in their corresponding components. The strings returned by these methods may contain both other characters and illegal characters, and will not contain any escaped octets.

You are using the three-argument constructor and the getFragment method afterwards. It looks like it should decode the spaces again, but it does not. This could be a bug, but the Sun Bug database seems to be offline now, so I can't really check this.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜