JavaScript: What characters are not encoded by encodeURIComponent?
I'm writing my own function i开发者_StackOverflown a different language, and I want it to provide identical results if possible.
You can find information in the MDC documentation:
encodeURIComponent
escapes all characters except the following:
alphabetic, decimal digits,- _ . ! ~ * ' ( )
Short answer, you can match all UTF-16 code units encodeURIComponent
would encode using the below:
/[^a-zA-Z0-9\-_.!~*'()]/g
though, the spec says that it handles supplemental code points with 4 byte UTF-8 encodings.
Long answer, ES 262 says
15.1.3.4 encodeURIComponent (uriComponent)
The encodeURIComponent function computes a new version of a URI in which each instance of certain characters is replaced by one, two, three, or four escape sequences representing the UTF-8 encoding of the character. When the encodeURIComponent function is called with one argument uriComponent, the following steps are taken:
Let componentString be ToString(uriComponent).
Let unescapedURIComponentSet be a String containing one instance of each character valid in uriUnescaped.
Return the result of calling Encode(componentString, unescapedURIComponentSet)
And uriUnescaped is defined thus
uriUnescaped ::: uriAlpha | DecimalDigit | uriMark
where
uriAlpha ::: one of a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
uriMark ::: one of - _ . ! ~ * ' ( )
DecimalDigit ::: one of 0 1 2 3 4 5 6 7 8 9
精彩评论