What is the rationale behind particular reserved characters in a URL?
I notice these characters are all illegal
#%<>?\/*+|:"
I notice these are encoded (%NN where NN is the hex value) but can be replace without problem
$,;=& @
(note the space which is typically encoded as +
(but may be %20))
#%?/+
i understand. But whats do the following characters do? &开发者_JAVA技巧lt;>\*|":
Note: I understand what :
does in the domain part (its the port) as @ is a login but after the first / why is : illegal? (@ isnt)
RFC 2396 (Uniform Resource Identifiers URI: Generic Syntax) says:
Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
2.4.3. Excluded US-ASCII Characters
The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields. The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4). The percent character "%" is excluded because it is used for the encoding of escaped characters.
delims = "<" | ">" | "#" | "%" | <">
Other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters.
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
I think that covers all that you mentioned. The star "*
" is not reserved and may be used. Paste this in a browser: http://en.wikipedia.org/wiki/*
I'm not sure about this, but could those be reserved so that if you try typing in URLs into a shell environment, the URL isn't split up into different pieces unnecessarily? For example, imagine I try executing
curl http://www.stackoverflow.com/this>that > myFile.txt
This might trip up the command prompt by having it try to get the incorrect URL http://www.stackoverflow.com/this
, then writing it to a file called that
, and then tripping up the interpreter when it hits the second >
. This explanation does account for all of the characters you listed (they all mean something in a shell environment), but it's just my first guess as to why it could be.
精彩评论