Strange xml/html accent issue
I have an XML file that contains a message with html tags in it. The XML file is read by a java class that mails it to people. When the mail is received, the accents do not show. For example é doesn't show.
I have tried é
in the xml but it gives an error in eclipse saying that the entity has not been declared.
I also tried simply inserting é
but that shows nothing in the final output.
The 3rd thing I tried was using <![CDATA[é]]>
but that broke the parser since it didn't output anything after it.
However I noticed something weird. When i put something like this in the xml and added UTF-16 encoding
<message>text bla bla blaa é<
it did ouput the é at the end like this bla bla blaa blaa é.
EDIT
<message>text bla bla blaa éé<
output开发者_运维知识库s ?é or just one é
The file looks something like this:
<?xml version="1.0"? encoding="UTF-16">
<message>
<b>hello é </b>
</message>
</xml>
What gives?
Did you try,change the encoding to UTF-8?
The encoding key that you provide in the tag MUST be consistent with the "real" encoding which has been used to edit and save the xml file on your harddrive.
If you edited your xml file in some european country under windows with notepad, it will surely be encoded in cp1252 (the default encoding used by windows in such situation, noting that cp1252 is a slight variant of normalized ISO8859-1 to include the euro sign).
In fact I would suggest to use an editing tool which allows you to control accurately which encoding to be used during edit/save operations (like http://jedit.org) so you can guarantee that the effective file encoding and the given encoding in its content (so to say in tag) are the same.
EDIT
It also depends greatly on the way your java program reads the xml file and uses it.
If an xml parser is used, it should be ok. Otherwise you'll probably have to use ISO-8859-1 encoding to store the file as it is the default read encoding used by java. If you're very unlucky and another encoding is used for the file reading process in the java class, well you'll have to comply to that...
EDIT 2
It also depends on the mail client and the way it manages encoding...
The é entity is an html entity that your xml parser is trying to interpret. Replace é
with &eacute;
and the xml parser will only interpret the &
which generates the html entity you want.
Regarding the UTF-16 encoding, the key piece of information missing here is the encoding of the file. Sounds like the file is being saved in UTF-16 format without a byte-order mark, which would explain why it only works with that encoding specified. You can verify this by checking the file size: it will be twice the number of characters in the file (or possibly a bit more if you're using certain unicode characters). Other likely encodings you can try are UTF-8 and iso-8859-1.
精彩评论