weird chars after getting value from XML php
I'm trying to get a value with a € sing out of xml but when I try it gives back weird code.
$xmlDate = $searchNode->getElementsByTagName( "kostenvoorverkoop" );
$valueKostenvoorverkoop = htmlentities($xmlDate->item(0)->nodeValue,ENT_QUOTES,"UTF-8");
//gives back Á€10,- instead of €10,-
can't find the problem.
//XML
<?xml version="1.0" encoding="ISO-8859-1" ?>
<price>€10</price>
If I leave the htmlentities it gives a completely wierde string like ÁáÙ%10 <---- not exactly this but you know what I mean.
if anyone can help me with this it would help m开发者_如何学运维e greatly, thanks in advance.
edit:
found a small work around: change the € for &euro;
. know not clean but works.
<?xml version="1.0" encoding="ISO-8859-1" ?>
<price>€10</price>
The character €
does not exist in ISO-8859-1, so this XML declaration can't possibly be right.
The output Á€
suggests the file has actually been encoded in Windows code page 1252 (Western European), which is similar to ISO-8859-1 but has different characters in the range 0x80–0x9F, include the euro sign.
PHP has parsed the data as ISO-8859-1, where the CP1252 encoding of €
, byte 0x80, maps to the control character U+0080. It then gives you the Unicode string containing U+0080 as a UTF-8-encoded byte string, U+00C2,U+0080. Outputting that to a browser in a page served as cp1252, ISO-8859-1 (for tedious confusing legacy reasons) or without a charset on a Western European machine, gives Á€
. htmlentities()
doesn't encode this in any way because there's no HTML entity for the control code U+0080.
Here's how you should proceed:
If you must have your XML input file in cp1252, state that in the XML declaration's
encoding="windows-1252"
rather than the inaccurateISO-8859-1
. XML parsers aren't required to be able to read cp1252, though, so better for interoperability would be to just use the default UTF-8 encoding and re-save the file to match.Serve your output HTML page as UTF-8, using a
Content-Type
header or meta tag. Then usehtmlspecialchars()
instead ofhtmlentities()
so you don't waste time encoding non-ASCII characters that don't need it.
Did you tried to change the encoding in the xml from ISO-8859-1 to UTF-8 ? Or just put in php this charset ISO-8859-1 when you are making the decoding..
精彩评论