开发者

Java EE Web Project and Character Encoding

we built a java ee web project and use jdbc for storing our data. The problem is that German 'Umlaute' like äöü are in use and properly stored in the mysql database. We don't know why, but in the browser those characters are broken, displaying weird stuff like

ö�

instead. I've already tried setting the enc开发者_C百科oding of the jdbc connection like described in this question:

JDBC character encoding

And the encoding of the html page is correctly set:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Any ideas how to fix that?


Update

connection.prepareStatement("SET CHARACTER SET utf8").execute();

won't make umlauts work. changing the meta-tag to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

won't change anything, too


"We don't know why, but in the browser those characters are broken"

Well, that's the first thing to find out. You should trace your data at every stage:

  • As you fetch it out of the database (with logging)
  • When you inject it into the page (with logging)
  • On the wire (via Wireshark)

When you log, don't just log the strings: log the Unicode characters that make up the strings, as integers. Just cast each character in the string to an integer and log it. It's primitive, but it'll tell you what you need to know.

When you look on the wire, of course, you'll be seeing bytes rather than characters as such. You should work out what bytes you expect for your chosen encoding, and check those against what's actually coming across the network.

You've specified the encoding in the HTML - but have you told whatever's generating your page that you want it in ISO Latin 1? That's likely to be responsible for both setting the content-type header and performing the actual conversion from text to bytes.

Additionally, is there any reason why you're using ISO Latin 1 instead of UTF-8? Why would you deliberately restrict yourself like that? (ISO Latin 1 can only handle the first 256 characters of Unicode, instead of the full range of Unicode characters. UTF-8 can handle everything, and is just as efficient for ASCII.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜