开发者

encoding problem in servlet

I have a servlet which receive some parameter from the client ,then do some job. And the parameter from the client is Chinese,so I often got some invalid characters in the servet. For exmaple: If I enter

http://localhost:8080/Servlet?q=中文&type=test

Then in the servlet,the paramet开发者_高级运维er of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.

However if I enter the adderss bar again,the url will changed to :

http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test

Now my servlet will get the right parameter of 'q'.

What is the problem?

UPDATE

BTW,it words well when I send the form with post. WHen I send them in the ajax,for example:

url="http://..q='中文',
xmlhttp.open("POST",url,true); 

Then the server side also get the invalid characters.

It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.

That's to say http://.../q=中文 does not work, http://.../q=%D6%D0%CE%C4 work.

But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work?

encoding problem in servlet


Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:

<%@ page pageEncoding="UTF-8" %>

Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.

<Connector URIEncoding="UTF-8">

For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.

request.setCharacterEncoding("UTF-8");

Call this before you access the first parameter. A Filter is the best place for this.

See also:

  • Unicode - How to get the characters right?


Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!

It would probably be safest to switch to POST requests.


I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.

So, try to check how do you send the request. It should be encoded on sending.

Other possibility is to use POST method instead of GET.


Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".

If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.

I think it's better to put them in the body (Post) then the URL (Get).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜