Python / Mako : How to get unicode strings/characters parsed correctly?
I'm trying to get Mako render some string with unicode characters :
tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace')
...
print sys.stdout.encoding
uname=cherrypy.sess开发者_如何学Goion['userName']
print uname
kwargs['_toshow']=uname
...
return tempLook.get_template(page).render(**kwargs)
The related template file :
...${_toshow}...
And the output is :
UTF-8
Deşghfkskhü
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)
I don't think there's any problem with the string itself since I can print it just fine.
Altough I've played (a lot) with input/output_encoding
and default_filters
parameters, it always complains about being unable to decode/encode with ascii codec.
So I decided to try out the example found on the documentation, and the following works the "best" :
input_encoding='utf-8', output_encoding='utf-8'
#(note : it still raised an error without output_encoding, despite tutorial not implying it)
With
${u"voix m’a réveillé."}
And the result being
voix mâ�a réveillé
I simply don't get why this doesn't work. "Magic encoding comment"s don't work either. All the files are encoded with UTF-8.
I've spent hours to no avail, am I missing something ?
Update :
I have a simpler question now :
Now that all the variables are unicode, how can I get Mako to render unicode strings without applying anything ? Passing a blank filter / render_unicode() doesn't help.
Yes, UTF-8 != Unicode.
UTF-8 is a specifc string encoding, as are ASCII and ISO 8859-1. Try this:
For any input string do a inputstring.decode('utf-8')
(or whatever input encoding you get). For any output string do a outputstring.encode('utf-8')
(or whatever output encoding you want). For any internal use, take unicode strings ('this is a normal string'.decode('utf-8') == u'this is a normal string'
)
'foo'
is a string, u'foo'
is a unicode string, which doesn't "have" an encoding (can't be decoded). SO anytime python want to change an encoding of a normal string, it first tries to "decode" it, the to "encode" it. And the default is "ascii", which fails more often than not :-)
精彩评论