
Python / Mako : How to get unicode strings/characters parsed correctly?

I'm trying to get Mako render some string with unicode characters :

tempLook=TemplateLookup(..., default_filters=[], input_encoding='utf8',output_encoding='utf-8', encoding_errors='replace')
print sys.stdout.encoding
print uname
return tempLook.get_template(page).render(**kwargs)

The related template file :


And the output is :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

I don't think there's any problem with the string itself since I can print it just fine.

Altough I've played (a lot) with input/output_encoding and default_filters parameters, it always complains about being unable to decode/encode with ascii codec.

So I decided to try out the example found on the documentation, and the following works the "best" :

input_encoding='utf-8', output_encoding='utf-8'
#(note : it still raised an error without output_encoding, despite tutorial not implying it) 


${u"voix m’a réveillé."} 

And the result being

voix mâ�a réveillé

I simply don't get why this doesn't work. "Magic encoding comment"s don't work either. All the files are encoded with UTF-8.

I've spent hours to no avail, am I missing something ?

Update :

I have a simpler question now :

Now that all the variables are unicode, how can I get Mako to render unicode strings without applying anything ? Passing a blank filter / render_unicode() doesn't help.

Yes, UTF-8 != Unicode.

UTF-8 is a specifc string encoding, as are ASCII and ISO 8859-1. Try this:

For any input string do a inputstring.decode('utf-8') (or whatever input encoding you get). For any output string do a outputstring.encode('utf-8')(or whatever output encoding you want). For any internal use, take unicode strings ('this is a normal string'.decode('utf-8') == u'this is a normal string')

'foo' is a string, u'foo' is a unicode string, which doesn't "have" an encoding (can't be decoded). SO anytime python want to change an encoding of a normal string, it first tries to "decode" it, the to "encode" it. And the default is "ascii", which fails more often than not :-)





验证码 换一张
取 消

