Tomcat + Wicket: UTF-8 chars not rendering properly
I have a Wicket app with some pag开发者_高级运维es containing accented chars, entered as UTF-8, e.g. "résumé".
When I debug the app via the traditional Wicket Start.java
class (which invokes an embedded Jetty server) all is good. However when I try deploying to a local Tomcat instance, it renders as "résumé".
My document looks like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"
xmlns:wicket="http://wicket.apache.org/dtds.data/wicket-xhtml1.4-strict.dtd">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
résumé
</body>
</html>
Here's what curl -I
returns for the page when running on Jetty:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Language: en-US
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Content-Length: 13545
Server: Jetty(6.1.25)
And here's what Tomcat returns:
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate
Content-Type: text/html;charset=UTF-8
Content-Language: en-US
Transfer-Encoding: chunked
Date: Sat, 23 Jul 2011 14:36:45 GMT
The problem is that Wicket doesn't detect the encoding of the markup files correctly. They are encoded as UTF-8, so non-ASCII chars are represented by two bytes. But Wicket doesn't know that and reads them as two separate characters. Those two characters are then encoded as UTF-8 again in the response. Since the "square root" characters is not ANSI itself you should actually see three bytes per é in the response.
Anyway, you need to fix this markup encoding interpretation. Checkout the Wicket source code for XMLReader#init().
It reads like Wicket tries three things the find out about the encoding of a markup file:
- Evaluates the
<?xml ... ?>
declaration in beginning of the markup file. (Missing for you?) - Uses the default encoding specified by
Application#getMarkupSettings().setDefaultMarkupEncoding(String)
- Uses the OS default.
It looks like are missing 1 and 2 at the moment so Wicket falls back to 3 which doesn't work in your case. So try any of the other two.
I'm not sure why this is needed, but here's a workaround that solved this for me:
public class Application extends WebApplication
{
@Override
protected void init()
{
getRequestCycleSettings().setResponseRequestEncoding("UTF-8");
getMarkupSettings().setDefaultMarkupEncoding("UTF-8");
}
}
To give credit where it is due, I found this solution here.
精彩评论