开发者

Intepretering Special Characters in XLSX Documents

I am using the xlsx Python library to read an XLSX document, but some column data contains special charac开发者_StackOverflow中文版ters like _x000D_. How can this be converted to its original form?


If _x000D_ is supposed to represent a unicode character with a hex code point, you could use a regular expression expression to find them and a callback function to convert them to the appropriate value.

import re

input_string = "H_x00E9_llo W_x00D8_rld!"

def parse_escaped_character_match(match):
    return unichr(int(match.group(1), 16))

print re.sub("_x([0-9A-F]{4})_", parse_escaped_character_match, input_string)
# prints "Héllo WØrld!"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜