HttpClient problem with URLs which include curly braces

2023-03-21 19:49 问答作者：

I am using HttpClient for my android application. At some point, I have to fetch data from remote locations. Below is the snippet how I made use of HttpClient to get the response.

St开发者_如何学编程ring url_s = "https://mydomain.com/abc/{5D/{B0blhahblah-blah}I1.jpg"; //my url string
DefaultHttpClient httpClient = new DefaultHttpClient();
response = httpClient.execute(new HttpGet(url_s));

It works absolutely fine in most cases but not when there is some curly braces in my url which is String basically. The stack trace shows me the index of curly braces saying Invalid character. So I tried to create URI from encoded URL.

URL url = new URL(url_s);
URI uri = url.toURI();
response = httpClient.execute(new HttpGet(uri));

After doing so, i didn't get the result from remote location at all. I worked around the problem and fixed it by replacing the curly brace

"{" with "%7B"
"}" with "%7D"

But I am not totally satisfy with my solution. Are there any better solutions? Anything neat and not hard-coded like mine?

The strict answer is that you should never have curly braces in your URL

A full description of valid URL's can be found in RFC1738

The pertinent part for this answer is as follows

Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

In order to bypass the problem you have been experiencing you must encode your url.

The problem you experienced with the "host may not be null" error will happen when the entire url is being encoded including the https://mydomain.com/ part so it gets confused. You only want to encode the last part of the URL called the path.

The solution is to use the Uri.Builder class to build your URI from the individual parts which should encode the path in the process

You will find a detailed description in the Android SDK Uri.Builder reference documentation

Some trivial examples using your values are:

Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon();
b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg");
Uri u = b.build();

Or you can use chaining:

    Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build();

Except RFC1738 has been obsolete for over a decade, has been superseded by rfc3986 and there is no indication in:

https://www.rfc-editor.org/rfc/rfc3986

That curly braces are unsafe (In fact, the RFC does not contain a single curly brace character anywhere). Furthermore, I've tried URI's in browsers that contain curly braces, and they work fine.

Also note the OP is using a class called URI - which should definitely be following 3986, at the very least, if not 3987.

However, oddly, IRIs defined in:

https://www.rfc-editor.org/rfc/rfc3987

Have the note that:

Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion
SHOULD fail. Please note that the number sign ("#"), the percent
sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted.

In other words, it looks like the RFCs themselves have some issues.

继续阅读：android httpclient

HttpClient problem with URLs which include curly braces

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？