开发者

Converting text containing COMBINING DIAERESIS to utf-8

We hav some text containing german umlauts repr开发者_开发技巧esented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).

Any idea how to convert such text properly to utf8?


First, if it's not already a unicode then decode it. Second, unicodedata.normalize(). Third, encode.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜