CherryPy doesn't properly handle non-ASCII characters in Jinja2 templates
I am trying to run a website using Python 2.7.1, Jinja 2.5.2, and CherryPy 3.1.2. The Jinja templates I am using are UTF-8 encoded. I noticed that some of the characters in those templates are being turned into question marks and other gibberish. If I try to render the templates directly without Jinja, I don't notice this problem. I discovered that I can fix it by calling .encode("utf-8")
on the output of all my handlers, but that gets annoying since it clutters up my source. Does anyone know why this would happen or what to do about it? I made a small script to demonstrate this problem. The "char.txt" file is a 2-byte file consisting solely of a UTF-8 encoded "»" character.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os, j开发者_开发问答inja2, cherrypy
jinja2env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
class Test(object):
def test1(self):
#doesn't work
#curl "http://example.com/test1"
#?
return jinja2env.get_template("char.txt").render()
test1.exposed = True
def test2(self):
#works
#curl "http://example.com/test2"
#»
return open("char.txt").read()
test2.exposed = True
def test3(self):
#works, but it is annoying to have to call this extra function all the time
#curl "http://example.com/test3"
#»
return jinja2env.get_template("char.txt").render().encode("utf-8")
test3.exposed = True
cherrypy.config["server.socket_port"] = 8500
cherrypy.quickstart(Test())
jinja2 works with Unicode only. It seems that cherrypy usually uses utf-8 as output encoding when the client sends no Accept-Header
, but falls back to iso-8859-1 when it is empty.
tools.encode.encoding: If specified, the tool will error if the response cannot be encoded with it. Otherwise, the tool will use the 'Accept-Charset' request header to attempt to provide suitable encodings, usually attempting utf-8 if the client doesn't specify a charset, but following RFC 2616 and trying ISO-8859-1 if the client sent an empty 'Accept-Charset' header.
http://www.cherrypy.org/wiki/BuiltinTools#tools.encode
I could fix the problem by using the encode tool like this:
cherrypy.config["tools.encode.on"] = True
cherrypy.config["tools.encode.encoding"] = "utf-8"
Example
$ curl "http://127.0.0.1:8500/test1"
»
$ curl "http://127.0.0.1:8500/test2"
»
$ curl "http://127.0.0.1:8500/test3"
»
From the CherryPy tutorial:
tools.encode: automatically converts the response from the native Python Unicode string format to some suitable encoding (Latin-1 or UTF-8, for example).
That sounds like your answer.
精彩评论