php json_encode with cyrillic characters
Not to reinvent the wheel I refer to already existing Cyrillic characters in PHP's json_encode.
The question is: what are those characters, what do they mean: \u0435, \u0434 and so on? I guess there is nothing to do wit开发者_StackOverflow社区h number of bytes, is that just a serial number in UTF-8 that corresponds to cyrillic symbols "е", "д" and so on respectively?
These are Unicode escape sequences that reference characters in the Unicode character set by denoting their code points in hexadecimal.
From the JSON specification:
Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter
u
, followed by four hexadecimal digits that encode the character's code point. The hexadecimal lettersA
thoughF
can be upper or lowercase. So, for example, a string containing only a single reverse solidus character may be represented as "\u005C
".
Although these characters do not need to be escaped (see unescaped rule), json_encode
does encode any character except those character that are also in US-ASCII (see source of json.c) to avoid encoding issues with US-ASCII-based protocols.
So inside a JSON string, \u0435
references the character at U+0435 that is the CYRILLIC SMALL LETTER IE (е
) and \u0434
references the character at U+0434 that is the CYRILLIC SMALL LETTER DE (д
).
精彩评论