Encoding errors in .jspx
I'm currently trying to deploy some RSS feeds on a WebLogic Application Server. The feeds' views are .jspx files, like the one below:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:georss="http://www.georss.org/georss"
xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:c="http://java.sun.com/jsp/jstl/core"
xmlns:fmt="http://java.sun.com/jsp/jstl/fmt"
xmlns:fn="http://java.sun.com/jsp/jstl/functions"
xmlns:util="http://example.com/util">
<jsp:directive.page pageEncoding="utf-8" contentType="application/xhtml+xml" />
<jsp:useBean id="now" class="java.util.Date" scope="page" />
[...]
<c:forEac开发者_如何学运维h var="category" items="${categories}">
<entry>
<title>${util:htmlEscape(category.label)}</title>
<id>${category.id}</id>
<c:if test="${empty parentId}">
<link href="${util:htmlEscape(fullRequest)}?parentId=${category.id}" />
</c:if>
<summary>${util:htmlEscape(category.localizedLabel)}</summary>
</entry>
</c:forEach>
</feed>
The problem is that on my local development server (Apache Tomcat 6.0) everything renders fine, but on the WebLogic server I get all the UTF-8 characters back mangled.
In Firefox, I see something like <summary>Formaci�n</summary>
. The byte sequence for the strange character is ef bf bd
and I seem to get that for all UTF-8 chars that I'm supposed to receive in the tests I'm conducting (á, ó, í). I've checked the content-type and encoding in firebug and it seems ok (Content-Type: application/xhtml+xml; charset=UTF-8
).
In Chrome, the content gets trucated at the first occurence of the strange character, with the error message: This page contains the following errors: error on line 1 at column 523: Encoding error
.
I'm not sure what's happening, but I think it's related to something that the web server is doing, considering that on my local Tomcat everything's ok. Any ideas are welcome.
Thanks,
AlexThe issue was coming from the order of the attributes in the jspx directive and the fact that I wasn't including the charset in the contentType
attribute!
After switching:
<jsp:directive.page pageEncoding="utf-8" contentType="application/xhtml+xml" />
to:
<jsp:directive.page contentType="application/xhtml+xml; charset=UTF-8"
pageEncoding="UTF-8" />
The characters came out fine. I fiddled around a bit more, and, curiously, found out that this:
<jsp:directive.page pageEncoding="UTF-8"
contentType="application/xhtml+xml; charset=UTF-8" />
doesn't work. I don't really understand why, but I'm guessing that it's a bug in WebLogic. The version I deployed on was 10.0.
The �
is the Unicode Replacement Character U+FFFD (in hex indeed 0xEF 0xBF 0xBD
).
This character is been used in Firefox to replace a character whose unicode codepoint actually lies outside the range of the character encoding the browser is been instructed to render the page in.
Since the browser is been instructed to render the page in UTF-8 and the character is initially ó
(U+00F3, 0xC3 0xB3
) which would be malformed into an unknown character when being decoded using a single byte charset to 0xF3
instead of 0xC3 0xB3
, the symptoms indicate that the server is actually decoding the response as ISO-8859-1 instead of UTF-8, but yet instructing the browser to encode it using UTF-8.
I don't do Weblogic, so I googled a bit and I came across this old bug report wherein one suggests to add the following to weblogic.xml
file to force it to parse JSP files using UTF-8.
<weblogic-web-app>
<jsp-descriptor>
<jsp-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</jsp-param>
<jsp-param>
<param-name>compilerSupportsEncoding</param-name>
<param-value>false</param-value>
</jsp-param>
</jsp-descriptor>
</weblogic-web-app>
See if that helps to solve your problem.
精彩评论