Linux/Python: encoding a unicode string for print
I have a fairly large python 2.6 application with lots of print statements sprinkled about. I'm using unicode strings throughout, and it usually works great. However, if I redirect the output of the application (like "myapp.py >output.txt"), then I occasionally get errors such as this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)
I guess the same issue comes up if someone has set their LOCALE to ASCII. Now, I understand perfectly well the reason for this error. There are characters in my Unicode strings that are not possible to encode in ASCII. Fair enough. But I'd like my python program to make a best effort to try to print something understandable, maybe skipping the suspicious characters or replacing them with their Unicode ids.
This problem must be common... What is the best practice for handling this problem? I'd prefer a solution that allows me to keep using plain old "print", but I can modify all occ开发者_如何学Pythonurrences if necessary.
PS: I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.
If you're dumping to an ASCII terminal, encode manually using unicode.encode
, and specify that errors should be ignored.
u = u'\xa0'
u.encode('ascii') # This fails
u.encode('ascii', 'ignore') # This replaces failed encoding attempts with empty string
If you want to store unicode files, try this:
u = u'\xa0'
print >>open('out', 'w'), u # This fails
print >>open('out', 'w'), u.encode('utf-8') # This is ok
I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails , as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.
Either wrap all your print statement through a method perform arbitrary unicode -> utf8 conversion or as last resort change the Python default encoding from ascii to utf-8 inside your site.py. In general it is a bad idea printing unicode strings unfiltered to sys.stdout since Python will trigger an implict conversion of unicode strings to the configured default encoding which is ascii.
精彩评论