how important is it that libraries treat utf16xx and utf32xx as equal peers to utf8?

2023-02-16 21:26 问答作者：

does any signifigant interchange take place in formats other than ascii/utf8? are there any fields where utf16xx and utf32xx are used heavily? i ask as a writer of multiple libraries that work on unicode text, and the burden of supporting all five major variants is quite high compared to the perce开发者_JS百科ived utility.

Windows and Java both treat Unicode as UTF-16 internally, and Python uses UTF-16 or UTF-32 depending on the platform. So more than just UTF-8 is important for these. These are just the cases I'm most familiar with, I'm sure there are others.

So, in my opinion, if you have a Unicode library, you should support UTF-16 and UTF-32. (I can't believe UTF-32 is too difficult, since there's no special processing involved besides byte ordering. Although, I'm not a Unicode library author :) )

One important point is XML: it can come in pretty much any encoding imaginable, but UTF-8 is by far the most common.

However, the XML spec says this:

All XML processors must accept the UTF-8 and UTF-16 encodings of Unicode

So if your application/library handles XML in any way it must support UTF-16 at least in that portion. Note that a conforming parser that converts the data to UTF-8 for processing would be enough here.

When it comes to interchange, I guess you are right that UTF-8 is prevalent. Some cases of using UTF-16 are various binary protocols such as DCOM, Java RMI and (maybe???) CORBA.

As for UTF-32 I've never heard of a case where it is used for interchange.

继续阅读：interop unicode

how important is it that libraries treat utf16xx and utf32xx as equal peers to utf8?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？