Clean hexadecimal entities in a XML doc via PHP
I need to send an XML document to a SOAP web service (which I don't have any control of). I was receiving an error because the texts contain html entities, so I clean the strings of text with html_entity_decode()
and then htmlspecialchars()
before I add the text to the simpleXML object, like this:
if( 开发者_开发技巧!mb_detect_encoding($string, "UTF-8") == "UTF-8" ) {
$string = utf8_encode($string);
}
$string = htmlspecialchars( html_entity_decode($string, ENT_COMPAT, 'UTF-8'), ENT_COMPAT, 'UTF-8');
$xml->addChild('PROD_DESC', $string);
But although it cleans named entities in the form ©
it doesn't do anything with hexadecimal entities like á
, and the service I am talking to doesn't accept those either.
In this post I found a possible solution, but when I pass that string to the tidy cleanString function I get the same string, it doesn't touch those entities either.
The numeric entities are added by SimpleXML because your XML document has no declared encoding:
// with declared encoding :
$xml = simplexml_load_string('<?xml version="1.0" encoding="utf-8"?><x></x>');
$xml->addChild('PROD_DESC', "à");
// result: <PROD_DESC>à</PROD_DESC>
// without declared encoding :
$xml = simplexml_load_string('<?xml version="1.0"?><x></x>');
$xml->addChild('PROD_DESC', "à");
// result: <PROD_DESC>à</PROD_DESC>
Is it acceptable for you to pass the string as base64 encoded data? This would eliminate the need to strip anything out.
精彩评论