How to decode a reserved escape character in a request URI on a web server?

2023-03-02 15:38 问答作者：

It is pretty clear that a web server has to decode any escaped unreserved character (such as alphanums, etc.) to do the URI comparison. For example, http://www.example.com/~user/index.htm shall be identical to http://www.example.com/%7Euser/index.htm.

My question is, what are we gonna do with the escaped reserved characters?

An example would be %2F, or /. If there is an %2F in the request URI, should the parser of web server replace it with开发者_StackOverflow中文版 a /? In the above example, it would mean that http://www.example.com/~user%2Findex.htm would be the same as http://www.example.com/~user/index.htm? Although I tried it on an Apache server (2.2.17 Unix) and it looks like it gives a "404 Not Found" error.

So does that mean %2F and other escaped reserved characters shall be left alone (at least before the URI comparison)?

Background information:

There are two places in RFC 2616 (HTTP 1.1) mentioning the escape decoding issue:

The Request-URI is transmitted in the format specified in section 3.2.1. If the Request-URI is encoded using the “% HEX HEX” encoding [42], the origin server MUST decode the Request-URI in order to properly interpret the request. Servers SHOULD respond to invalid Request-URIs with an appropriate status code.

and

Characters other than those in the “reserved” and “unsafe” sets (see RFC 2396 [42]) are equivalent to their “"%" HEX HEX” encoding.

(according to http://trac.tools.ietf.org/wg/httpbis/trac/ticket/2 "unsafe" is a mistake and shall be removed from the spec. So we are only looking at "reserved" here.)

FYI, the definition of such characters in RFC 2396:

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

unreserved = alphanum | mark

mark = "-" | "_" | "." | "!" | "˜" | "*" | "’" | "(" | ")"

tl;dr:

Decode percent-encoded unreserved characters,
keep percent-encoded reserved characters.

The URI standard is STD 66, which currently is RFC 3986.

Section 6 is about Normalization and Comparison, where section 6.2.2.2 explains what to do with percent-encoded octets:

These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character […]

As explicitly stated in section 2 (bold emphasis mine):

Unreserved characters:

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent
Reserved characters:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

继续阅读：escaping parsing percent-encoding uri

How to decode a reserved escape character in a request URI on a web server?

Background information:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Background information:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？