Are Latin encoded characters considered URL safe?
Are Latin encoded characters considered URL safe?
Having read this post, I'm aware that web safe characters are outlined in this document. The specs do not make clear, however, if Latin encoded characters are part of the unreserved list. For example: ç
and õ
.
I do开发者_如何学Cn't see why those characters would not be included in the unreserved list. But, that said, I'm yet to see any URLs that contain such characters.
Relevant question: Assuming I can use such characters in my URL, should I?
My URLs will be generated by user input. Should I keep titles with such characters, or substitute them? For example, ç
to becomes c
, and so on.
My reader's native language is Portuguese, but I'm not sure if they will care about these characters in the page's friendly-URL.
The RFC you linked mentioned specifically mentions ASCII as the character set for URIs:
The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII].
That would make characters outside of ASCII not safe, as far as the RFC is concerned.
Of course, this is all before IDN existed. There is an RFC that specifies how conversions between ASCII and Unicode on the URL should occur.
You can use any characters you want, because if any character is used outside the range of ASCII code list the percent-code octets is used in order to make the uri transportable
精彩评论