开发者

Python: How to get StringIO.writelines to accept unicode string?

I'm getting a

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128)

on开发者_如何学C a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is trying seemingly trying to encode it in ascii format:

result.writelines(['blahblah',a.desc,'blahblahblah'])

How do I instruct it to treat the encoding as unicode if that's the correct phrasing?

app engine runs on python 2.5


You can wrap the StringIO object in a codecs.StreamReaderWriter object to automatically encode and decode unicode.

Like this:

import cStringIO, codecs
buffer = cStringIO.StringIO()
codecinfo = codecs.lookup("utf8")
wrapper = codecs.StreamReaderWriter(buffer, 
        codecinfo.streamreader, codecinfo.streamwriter)

wrapper.writelines([u"list of", u"unicode strings"])

buffer will be filled with utf-8 encoded bytes.

If I understand your case correctly, you will only need to write, so you could also do:

import cStringIO, codecs
buffer = cStringIO.StringIO()
wrapper = codecs.getwriter("utf8")(buffer)


StringIO documentation:

Unlike the memory files implemented by the StringIO module, those provided by [cStringIO] are not able to accept Unicode strings that cannot be encoded as plain ASCII strings.

If possible, use StringIO instead of cStringIO.


You can also encode your string as utf-8 manually before adding it to the StringIO

for val in rows:
    if isinstance(val, unicode):
        val = val.encode('utf-8')
result.writelines(rows)


Python 2.6 introduced the io module and you should consider using io.StringIO(), "An in-memory stream for unicode text."

In older python versions this is not optimized (pure Python), in later versions this has been optimized to (fast) C code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜