
Clean hexadecimal entities in a XML doc via PHP

I need to send an XML document to a SOAP web service (which I don't have any control of). I was receiving an error because the texts contain html entities, so I clean the strings of text with html_entity_decode() and then htmlspecialchars() before I add the text to the simpleXML object, like this:

if( 开发者_开发技巧!mb_detect_encoding($string, "UTF-8") == "UTF-8" ) {
   $string = utf8_encode($string);
$string = htmlspecialchars( html_entity_decode($string, ENT_COMPAT, 'UTF-8'), ENT_COMPAT, 'UTF-8');
$xml->addChild('PROD_DESC', $string);

But although it cleans named entities in the form © it doesn't do anything with hexadecimal entities like á, and the service I am talking to doesn't accept those either.

In this post I found a possible solution, but when I pass that string to the tidy cleanString function I get the same string, it doesn't touch those entities either.

The numeric entities are added by SimpleXML because your XML document has no declared encoding:

// with declared encoding :
$xml = simplexml_load_string('<?xml version="1.0" encoding="utf-8"?><x></x>');
$xml->addChild('PROD_DESC', "à");
// result: <PROD_DESC>à</PROD_DESC>

// without declared encoding :
$xml = simplexml_load_string('<?xml version="1.0"?><x></x>');
$xml->addChild('PROD_DESC', "à");
// result: <PROD_DESC>&#xE0;</PROD_DESC>

Is it acceptable for you to pass the string as base64 encoded data? This would eliminate the need to strip anything out.





验证码 换一张
取 消

