Intepretering Special Characters in XLSX Documents
I am using the xlsx Python library to read an XLSX document, but some column data contains special charac开发者_StackOverflow中文版ters like _x000D_
. How can this be converted to its original form?
If _x000D_
is supposed to represent a unicode character with a hex code point, you could use a regular expression expression to find them and a callback function to convert them to the appropriate value.
import re
input_string = "H_x00E9_llo W_x00D8_rld!"
def parse_escaped_character_match(match):
return unichr(int(match.group(1), 16))
print re.sub("_x([0-9A-F]{4})_", parse_escaped_character_match, input_string)
# prints "Héllo WØrld!"
精彩评论