开发者

2 encodings between an HTML representation

Im reading one c开发者_C百科hapter from the W3C HTML Document Representation

In the 5.1 says this:

User agents must also know the specific character encoding that was used to transform the document character stream into a byte stream.

Then in the 5.2 says this:

The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters.

Char-Bytes

Bytes-Char

So im wrong or there are 2 encodings between the representation...


A "character encoding" such as UTF-8 is, strictly speaking, a specification for representing characters as a sequence of bytes. But the encodings are always reversible, so we can speak of a (single) character encoding as going both ways.

Other character encodings used in practice are UTF-16 ad UTF-32.

Each of these are specifications under which you can encode text as bytes and decode bytes into characters. Two parts of the same specification.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜