Python file input string: how to handle escaped unicode characters?

2022-12-29 18:45 问答作者：

In a text file (test.txt), my string looks like this:

Gro\u00DFbritannien

Reading it, python escapes the backslash:

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'

How can I have this interpreted as unicode? decode() and unicode() won't do the job.

The following code writes Gro\u00DFbritannien back to the file,开发者_如何学Python but I want it to be Großbritannien

>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)

You want to use the unicode_escape codec:

>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien

See the docs for the vast number of standard encodings that come as part of the Python standard library.

Use the built-in 'unicode_escape' codec:

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'

You may also use codecs.open():

>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'

The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings

继续阅读：decode python unicode utf-8

Python file input string: how to handle escaped unicode characters?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？