Parameters charset conversion in struts2
I have a struts2 web application which accepts both POST and GET requests in many different charsets, does conversion of them into utf-8, displays the correct utf-8 characters on the screen and then writes them into utf-8 database.
I have tried at least 5 different methods for doing simple losless charset conversion of windows-1250 to utf-8 to start with, and all of them did not work. Utf-8 being the "larger set", it should work without a problem (at least this is my understanding).
Can you propose how to do a charset conversion from windows-1250 to utf-8, and is it possible that struts2 is doing something weird with the params charset, which would explain why I can't seem to get it right.
This is my latest attempt:
String inputData = getSimpleParamValue("some_input_param_from_get");
Charset inputCharset = Charset.forName("windows-1250");
Charset utfCharset = Charset.forName("UTF-8");
CharsetDecoder decoder = inputCharset.newDecoder();
CharsetEncoder encoder = utfCharset.newEncoder();
String decodedData = "";
try {
ByteBuffer inputBytes = ByteBuffer.wrap(inputData.getBytes()); // I've tried putting UTF-8 here as well, with no luck
C开发者_运维百科harBuffer chars = decoder.decode(inputBytes);
ByteBuffer utfBytes = encoder.encode(chars);
decodedData = new String(utfBytes.array());
} catch (CharacterCodingException e) {
logger.error(e);
}
Any ideas on what to try to get this working?
Thank you and best regards,
Bozo
I'm not sure of your scenario. In Java, a String is Unicode, one only deals with charset conversion when has to convert from/to String to/from a binary representation. In your example, when getSimpleParamValue("some_input_param_from_get") is called, inputData should already have the "correct" String, the conversion from the stream of bytes (that travelled from the client browser to the web server) to a string should have already taken part (responsability of the web server+web layer of your application). For this, I enforce UTF-8 for the web trasmission, placing a filter in the web.xml (before Struts), for example:
public class CharsetFilter implements Filter {
public void destroy() {}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest) request;
HttpServletResponse res = (HttpServletResponse) response;
req.setCharacterEncoding("UTF-8");
chain.doFilter(req, res);
String contentType = res.getContentType();
if( contentType !=null && contentType.startsWith("text/html"))
res.setCharacterEncoding("UTF-8");
}
public void init(FilterConfig filterConfig) throws ServletException {
}
}
If you cannot do this, and if your getSimpleParamValue() "errs" in the charset conversion (eg: it assumed the byte stream was UTF-8 and was windows-1250) you now have an "incorrect" string, and you must try to recover it by undoing and redoing the byte-to-string conversion - in which case you must know the wrong AND the correct charset - and, worse, deal with the possibity of missing chars (if it was interpreted as UTF8, i maight have found illegal char sequence). If you have to deal with this in a Struts2 action, I'd say you are in problems, you should deal with it explicitly before/after it (in the upper web layer - or in the Database driver or File encoding or whatever)
精彩评论