Python - Python 3.1 can't seem to handle UTF-16 encoded files?

2023-02-24 12:31 问答作者：

I'm trying to run some code to simply go through a bunch of files and write those that happen to be .txt 开发者_开发知识库files into the same file, removing all the spaces. Here's some simple code that should do the trick:

for subdir, dirs, files in os.walk(rootdir):
for file in files:
    if '.txt' in file:
        f = open(subdir+'/'+file, 'r')
        line = f.readline()
        while line:
            line2 = line.split()
            if line2:
                output_file.write(" ".join(line2)+'\n')
            line = f.readline()
        f.close()

But instead, I get the following error:

File "/usr/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: unexpected code byte

It turns out these .txt files are all in UTF-16 (according to FireFox, at any rate). I thought Python 3.x was supposed to be able to handle any sort of character encoding??

Best, Georgina

Use open(bla, 'r', encoding="utf-16").

There are various utf-16 encodings.

utf-16-be big endian no BOM
utf-16-le little endian no BOM
utf-16 little endian + BOM

Examples:

Python 3.2 (r32:88452, Feb 20 2011, 11:12:31) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'a'.encode('utf-16')
>>> a
b'\xff\xfea\x00'
>>> a.decode('utf-16')
'a'
>>> a = 'a'.encode('utf-16-le')
>>> a
b'a\x00'
>>> a.decode('utf-16-le')
'a'
>>> a = 'a'.encode('utf-16-be')
>>> a
b'\x00a'
>>> a.decode('utf-16-be')
'a'

You can use these encodings as suggested by @filmor's answer

继续阅读：character-encoding python utf-16 utf-8

Python - Python 3.1 can't seem to handle UTF-16 encoded files?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？