Create an utf-8 csv file in Python

2023-01-04 09:35 问答作者：

I can't create an utf-8 csv file in Python.

I'm trying to read it's docs, and in the examples section, it says:

For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8:

Ok. So I have this code:

values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.c开发者_开发知识库sv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)

And I keep getting this error:

line 159, in writerow
    self.stream.write(data)
  File "/usr/lib/python2.6/codecs.py", line 686, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)

Can someone please give me a light so I can understand what the hell am I doing wrong since I set all the encoding everywhere before calling UnicodeWriter class?

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

You don't have to use codecs.open; UnicodeWriter takes Unicode input and takes care of encoding everything into UTF-8. When UnicodeWriter writes into the file handle you passed to it, everything is already in UTF-8 encoding (therefore it works with a normal file you opened with open).

By using codecs.open, you essentially convert your Unicode objects to UTF-8 strings in UnicodeWriter, then try to re-encode these strings into UTF-8 again as if these strings contained Unicode strings, which obviously fails.

As you have figured out it works if you use plain open.

The reason for this is that you tried to encode UTF-8 twice. Once in

f = codecs.open('eggs.csv', 'w', encoding="utf-8")

and then later in UnicodeWriter.writeRow

# ... and reencode it into the target encoding
data = self.encoder.encode(data)

To check that this works use your original code and outcomment that line.

Greetz

I ran into the csv / unicode challenge a while back and tossed this up on bitbucket: http://bitbucket.org/famousactress/dude_csv .. might work for you, if your needs are simple :)

You don't need to "double-encode" everything.

Your application should work entirely in Unicode.

Do your encoding only in the codecs.open to write UTF-8 bytes to an external file. Do no other encoding within your application.

继续阅读：csv encoding python utf-8

Create an utf-8 csv file in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？