Is 0xF8 a valid byte in a UTF-8 encoded XML document?

2023-02-06 17:11 问答作者：

I am receiving a document that claims to be UTF-8 (<?xml version="1.0" encoding="UTF-8"?>). I've had some problems in the past where the encoding declaration from the sender has not been all that reliable (i.e. documents are declared to have a given encoding when in fact they do not), so I try to check using http://utf8checker.codeplex.com/ According to this tool, a 0xF8 byte means that this document is not UTF-8 encoded.

However, to the contrary, this page lists the Norwegian character 'ø' as being represented in UTF-8 as 0xF8. (The page is in Norwegian, however, the data I am referring to stems from the table at the bottom of the pag开发者_如何学JAVAe.)

Can anyone help me sort this out? I'm feeling rather confused here.

Thanks!

ø is U+00F8 and since it is not in ASCII it cannot be a single UTF-8 code unit. It is represented by 0xC3 0xB8 in UTF-8. Therefore, if you have 0xF8 standing alone in a document somewhere, yes, it is invalid UTF-8.

It seems that the document uses either Latin-1 or the Windows code page 1252.

I don't think that page is very reliable, it also says "UTF-8 = UCS-1".

Checking Wikipedia, F8 can only be used as the first byte of a 5 byte UTF-8 sequence, but currently no Unicode characters exist which would require 5 byte encoding. So no.

The utf8checker tool is right and the page you are referring to is wrong. The UTF-8 representation of 'ø' is 0xC3 0xB8 (two bytes).

http://www.fileformat.info/info/unicode/char/f8/index.htm

继续阅读：encoding utf-8 xml

Is 0xF8 a valid byte in a UTF-8 encoded XML document?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？