Javascript Charset problem
I want to read a file from my server with javascript and display it's content in a html page. The file is in ANSI charset, and it has romanian characters.. I want to display those characters in the way they are :D not in different black symbols..
So I think my problem is the charset.. I have a get request that takes the content of the file, like this:
function IO(U, V) {//LA MOD String Version. A tiny ajax library. by, DanDavis
var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
X.open(V ? 'PUT' : 'GET', U, false );
X.setRequestHeader('Content-Type', 'Charset=UTF-8');
X.send(V ? V : '');return X.responseText;}
As far as I know the romanian characters are included in UTF-8 charset so I set the charset of the request header to utf-8.. the file is in utf-8 format and I have the meta tag that tells the browser that the page has utf-8 co开发者_如何学Cntent..
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
So if I query the server the direct file, the browser shows me the romanian characters but if I display the content of the page through this script, I see only symbols instead of characters.. So what I am doing wrong?
Thank you!
PS: I want this to work on Firefox at least not necessarily in all browsers..
While my initial assumption was the same as T.J. Crowder's, a quick chat established that the OP uses some hosting service and cannot easily change the Content-Type headers.
The files were sent as text/plain
or text/html
without any Charset
paramter, hence the browser interprets them as UTF-8 (which is the default).
So saving the files in UTF-8 (instead of ANSI/Windows-1252) did the trick.
You need to ensure that the HTTP response returning the file data has the correct charset identified on it. You have to do that server-side, I don't think you can force it from the client. (When you set the content type in the request header, you're setting the content type of the request, not the response.) So for instance, the response header from the server would be along the lines of:
Content-Type: text/plain; charset=windows-1252
...if by "ANSI" you mean the Windows-1252 charset. That should tell the browser what it needs to do to decode the response text correctly before handing it to the JavaScript layer.
One problem, though: As far as I can tell, Windows-1252 doesn't have the full Romanian alphabet. So if you're seeing characters like Ș
, ș
, Ţ
, ţ
, etc., that suggests the source text is not in Windows-1252. Now, perhaps it's okay to drop the diacriticals on those in Romanian (I wouldn't know) and so if your source text just uses S
and T
instead of Ș
and Ţ
, etc., it could still be in Windows-1252. Or it may be ISO-8859 or ISO-8859-2 (both of which drop some diacriticals) or possibly ISO-8859-16 (which has full Romanian support). Details here.
So the first thing to do is determine what character set the source text is actually in.
精彩评论