CData section not finished problem
When I use DOMDocument::loadXML() for my XML below I get error:
Warning: DOMDocument::loadXML() [domdocument.loadxml]: CData section not finished http://www.site.org/displayimage.php?album=se in Entity,
Warn开发者_运维知识库ing: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag image line 7 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizz line 3 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizzes line 2 in Entity
Fatal error: Call to a member function getElementsByTagName() on a non-object
It seems to me that my CData sections are closed but still I get this error. XML looks like this:
<?xml version="1.0" encoding="utf-8"?>
<quizzes>
<quizz>
<title><![CDATA[Title]]></title>
<descr><![CDATA[Some text here!]]></descr>
<tags><![CDATA[one tag, second tag]]></tags>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=1]]></image>
<results>
<result>
<title><![CDATA[Something]]></title>
<descr><![CDATA[Some text here]]></descr>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=17]]></image>
<id>1</id>
</result>
</results>
</quizz>
</quizzes>
Could you help me discover what is the problem?
I found that usually there are problems with hidden XML chars, so I prefer escape invalid chars like beloved:
<?php
//$feedXml is the fetched XML content
$invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
$feedXml = preg_replace($invalid_characters, '', $feedXml );
Sorry if this is off topic because it is only related to a specific case with PHP when using cURL but, as tomaszs states, I too discovered that ampersands can cause a problem when passing XML via cURL in PHP. I had been receiving a known valid XML string with ampersands properly encoded and was then forwarding it to another address with cURL. Something like this...
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $fullUri);
curl_setopt($curlHandle, CURLOPT_HEADER, false);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlHandle, CURLOPT_CONNECTTIMEOUT, 4); // seconds
curl_setopt($curlHandle, CURLOPT_POST, true);
curl_setopt($curlHandle, CURLOPT_POSTFIELDS, "xmlstr=" . $xmlstr); // Problem
The issue occurs in the last line above when adding the XML to CURLOPT_POSTFIELDS. The first encoded ampersand gets seen as a delimiter for a parameter, as in a querstring, and the "xmlstr" variable/field is truncated.
The solution I used was to replace the last line above with...
curl_setopt($curlHandle, CURLOPT_POSTFIELDS, "xmlstr=" . urlencode($xmlstr));
Hope this helps someone.
The answers here have the right idea: There is some sort of bad, possibly non-printing, character in the document, which breaks the parser. None of the answers above solved my problem, instead I used tr
to write a "clean" version of the file and then I was able to parse that, ie,
<?php
try {
$simpleXMLobject = simplexml_load_file($feed);
} catch (\Exception $ex) {
//try to clean the file and reload it
$tempFile = sys_get_temp_dir() . "/" . uniqid("rdc");
shell_exec(
"tr -cd '\11\12\15\40-\176' < " .
escapeshellarg($feed) . " > " .
escapeshellarg($tempFile)
);
try {
$simpleXMLobject = simplexml_load_file($tempFile);
} catch (\Exception $ex) {
$err = $ex->getTraceAsString();
echo die($err);
}
}
I don't see any error (either the actually used XML is different form the provided, or the xml processor used (BTW, what is it?) is buggy).
I would recommend to avoid using CDATA sections. Use the following XML document, which is the same as (text-equivalent to) the provided, and much more readable:
<quizzes>
<quizz>
<title>Title</title>
<descr>Some text here!</descr>
<tags>one tag, second tag</tags>
<image>http://www.site.org/displayimage.php?album=search&cat=0&pos=1</image>
<results>
<result>
<title>Something</title>
<descr>Some text here</descr>
<image>http://www.site.org/displayimage.php?album=search&cat=0&pos=17</image>
<id>1</id>
</result>
</results>
</quizz>
</quizzes>
I 've found that the problem was with passing this XML in PHP with cURL. I've sent it as normal text, and & char in this XML was interpreted as delimiter to next parameter. So when I escaped this char it started to work properly.
精彩评论