开发者

UnicodeEncodeError when writing to a file [duplicate]

This question already has answers here: 开发者_C百科 UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) (34 answers) Closed 24 days ago.

I am trying to write some strings to a file (the strings have been given to me by the HTML parser BeautifulSoup).

I can use "print" to display them, but when I use file.write() I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 6: ordinal not in range(128)

How can I parse this?


This error occurs when you pass a Unicode string containing non-English characters (Unicode characters beyond 128) to something that expects an ASCII bytestring. The default encoding for a Python bytestring is ASCII, "which handles exactly 128 (English) characters". This is why trying to convert Unicode characters beyond 128 produces the error.

The unicode()

unicode(string[, encoding, errors])

constructor has the signature unicode(string[, encoding, errors]). All of its arguments should be 8-bit strings.

The first argument is converted to Unicode using the specified encoding; if you leave off the encoding argument, the ASCII encoding is used for the conversion, so characters greater than 127 will be treated as errors

for example

s = u'La Pe\xf1a' 
print s.encode('latin-1')

or

write(s.encode('latin-1'))

will encode using latin-1


The answer to your question is "use codecs". The appeded code also shows some gettext magic, FWIW. http://wiki.wxpython.org/Internationalization

import codecs

import gettext

localedir = './locale'
langid = wx.LANGUAGE_DEFAULT # use OS default; or use LANGUAGE_JAPANESE, etc.
domain = "MyApp"             
mylocale = wx.Locale(langid)
mylocale.AddCatalogLookupPathPrefix(localedir)
mylocale.AddCatalog(domain)

translater = gettext.translation(domain, localedir, 
                                 [mylocale.GetCanonicalName()], fallback = True)
translater.install(unicode = True)

# translater.install() installs the gettext _() translater function into our namespace...

msg = _("A message that gettext will translate, probably putting Unicode in here")

# use codecs.open() to convert Unicode strings to UTF8

Logfile = codecs.open(logfile_name, 'w', encoding='utf-8')

Logfile.write(msg + '\n')

Despite Google being full of hits on this problem, I found it rather hard to find this simple solution (it is actually in the Python docs about Unicode, but rather burried).

So ... HTH...

GaJ


I tried this it works fine

with open(r"C:\rag\sampleoutput.txt", 'w', encoding="utf-8") as f:  
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜