Does a servlet know the encoding of the sent form that specified using http-equiv?
Does a servlet knows the encoding of the sent form that specified using http-equiv?
When I specify an encoding of a POSTed form using http-equiv like that:
<HTML>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=gb2312'/>
</head>
<BODY >
<form name="form" method="post" >
<input type="text" name="v_rcvname" value="相宜本草">
&l开发者_如何学Ct;/form>
</BODY>
</HTML>
And then at the servlet I use the method, request.getCharacterEncoding()
I got null
!
So, Is there a way that I can tell the server that I am encoding the data in some char encoding??
This will indeed return null
from most webbrowsers. But usually you can safely assume that the webbrowser has actually used the encoding as specified in the original response header, which is in this case gb2312
. A common approach is to create a Filter
which checks the request encoding and then uses ServletRequest#setCharacterEncoding()
to force the desired value (which you should of course use consistently throughout your webapplication).
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws ServletException, IOException {
if (request.getCharacterEncoding() == null) {
request.setCharacterEncoding("gb2312");
}
chain.doFilter(request, response);
}
Map this Filter
on an url-pattern
covering all servlet requests, e.g. /*
.
If you didn't do this and let it go, then the servletcontainer will use its default encoding to parse the parameters, which is usually ISO-8859-1
, which in turn is wrong. Your input of 相宜本草
would end up like ÏàÒ˱¾²Ý
.
It's impossible to send POST data back in GB2312. I think UTF-8 is the W3C recommendation and all new browsers only send data back in either Latin-1 or UTF-8.
We were able to get GB2312 encoded data back in old IE on Win 95 but it's generally not possible on the new Unicode based browsers.
See this test on Firefox,
POST / HTTP/1.1
Host: localhost:1234
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 46
My page is in GB2312 and I specified GB2312 everywhere but the Firefox simply ignores it.
Some broken browsers even encode Chinese in Latin-1. We recently added a hidden field with a known value. By checking the value, we can figure out the encoding.
request.getCharacterEncoding() returns the encoding from Content-Type. As you can see from my trace, it's always null.
精彩评论