Converting text containing COMBINING DIAERESIS to utf-8
We hav some text containing german umlauts repr开发者_开发技巧esented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
First, if it's not already a unicode
then decode it. Second, unicodedata.normalize()
. Third, encode.
精彩评论