开发者

double byte character which one of it's bytes is an '<' or '>'

is there in any double byte encoding开发者_开发知识库 a character that one of it's bytes has the same value of the ASCII characters '<' or '>' ? i cant seems to find one, but i have to make sure that there is no such cases, since such double byte characters might cause errors on html parsers.


In any encoding? Almost certainly yes. In fact, there are hundreds of characters that have 0x3c or 0x3e (the values of < and > in ASCII) as one of the bytes of their UTF-16 encodings, for instance "☼", the UTF-16le representation of which looks like the ASCII for <&.

But it's not appropriate to deliver HTML in some random character set without also specifying out-of-band (for instance via HTTP headers) what encoding it's using, and possibly using other signals such as a BOM (which is required for HTML5) or an XML encoding specifier (which is required for XHTML in some cases as dictated by the XML standard).

And if your encoding is specified properly then there should be no problem, because the characters < and > are special in HTML, not the bytes 0x3c and 0x3e. Any "parser" that thinks differently is broken.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜