java URI changes in JDK 1.4 vs JDK 1.5

2023-03-23 14:03 问答作者：

import java.net.*;

public class TestURI {
     public static void main(String args[]) throws URISyntaxException
     {
        String first = new String("foo");
        String second = new String("bar");
        String third = new String("[space or another space]");

        URI temp = new URI(first, second, third);
        System.out.println(temp.getFragment());

     }
}

When I run the above code in JDK 1.4, I get

[space or another space]

When I run the same code in JDK 1.5/1.6, I get the following:

[space%20or%20another%20space]

Could somebody tell me what changed?

Thanks, Raj

Edit:

If I do something like the following, it works:

import java.net.*;

public class TestURI {
   public static void main(String args[]) throws URISyntaxException
   {
      String first = new String("foo");
      String开发者_如何学运维 second = new String("bar");
      String third = new String("[space or another space]").replaceAll("\\[", "leftSB").replaceAll("\\]", "rightSB");

      URI temp = new URI(first, second, third);
      System.out.println(temp.getFragment().replaceAll("leftSB", "\\[").replaceAll("rightSB", "\\]"));

   }
}

It looks like the spaces got URI-encoded.

%20 is the hexadecimal formatting of the ASCII space character.

I suppose spaces are illegal in the fragment identifier, which the implementation in Java 1.4 did not know.

From the class documentation, emphasis by me:

RFC 2396 allows escaped octets to appear in the user-info, path, query, and fragment components. Escaping serves two purposes in URIs:

To encode non-US-ASCII characters when a URI is required to conform strictly to RFC 2396 by not containing any other characters.

To quote characters that are otherwise illegal in a component. The user-info, path, query, and fragment components differ slightly in terms of which characters are considered legal and illegal.

These purposes are served in this class by three related operations:

A character is encoded by replacing it with the sequence of escaped octets that represent that character in the UTF-8 character set. [...]

An illegal character is quoted simply by encoding it. The space character, for example, is quoted by replacing it with "%20". [...]

A sequence of escaped octets is decoded by replacing it with the sequence of characters that it represents in the UTF-8 character set. [...]

These operations are exposed in the constructors and methods of this class as follows:

The single-argument constructor [...]

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

...

The getUserInfo, getPath, getQuery, getFragment, getAuthority, and getSchemeSpecificPart methods decode any escaped octets in their corresponding components. The strings returned by these methods may contain both other characters and illegal characters, and will not contain any escaped octets.

You are using the three-argument constructor and the getFragment method afterwards. It looks like it should decode the spaces again, but it does not. This could be a bug, but the Sun Bug database seems to be offline now, so I can't really check this.

继续阅读：string uri

java URI changes in JDK 1.4 vs JDK 1.5

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？