开发者

CData section not finished problem

When I use DOMDocument::loadXML() for my XML below I get error:

Warning: DOMDocument::loadXML() [domdocument.loadxml]: CData section not finished http://www.site.org/displayimage.php?album=se in Entity,
Warn开发者_运维知识库ing: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag image line 7 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizz line 3 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizzes line 2 in Entity
Fatal error: Call to a member function getElementsByTagName() on a non-object 

It seems to me that my CData sections are closed but still I get this error. XML looks like this:

<?xml version="1.0" encoding="utf-8"?>
<quizzes>
<quizz>
<title><![CDATA[Title]]></title>
<descr><![CDATA[Some text here!]]></descr>
<tags><![CDATA[one tag, second tag]]></tags>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=1]]></image>
<results>
<result>
<title><![CDATA[Something]]></title>
<descr><![CDATA[Some text here]]></descr>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=17]]></image>
<id>1</id>
</result>
</results>
</quizz>
</quizzes>

Could you help me discover what is the problem?


I found that usually there are problems with hidden XML chars, so I prefer escape invalid chars like beloved:

<?php
//$feedXml is the fetched XML content
$invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
$feedXml = preg_replace($invalid_characters, '', $feedXml );


Sorry if this is off topic because it is only related to a specific case with PHP when using cURL but, as tomaszs states, I too discovered that ampersands can cause a problem when passing XML via cURL in PHP. I had been receiving a known valid XML string with ampersands properly encoded and was then forwarding it to another address with cURL. Something like this...

$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL,            $fullUri);
curl_setopt($curlHandle, CURLOPT_HEADER,         false);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlHandle, CURLOPT_CONNECTTIMEOUT, 4); // seconds
curl_setopt($curlHandle, CURLOPT_POST,           true);
curl_setopt($curlHandle, CURLOPT_POSTFIELDS,     "xmlstr=" . $xmlstr); // Problem

The issue occurs in the last line above when adding the XML to CURLOPT_POSTFIELDS. The first encoded ampersand gets seen as a delimiter for a parameter, as in a querstring, and the "xmlstr" variable/field is truncated.

The solution I used was to replace the last line above with...

curl_setopt($curlHandle, CURLOPT_POSTFIELDS,     "xmlstr=" . urlencode($xmlstr));

Hope this helps someone.


The answers here have the right idea: There is some sort of bad, possibly non-printing, character in the document, which breaks the parser. None of the answers above solved my problem, instead I used tr to write a "clean" version of the file and then I was able to parse that, ie,

<?php
try {
    $simpleXMLobject = simplexml_load_file($feed);
} catch (\Exception $ex) {
    //try to clean the file and reload it
    $tempFile = sys_get_temp_dir() . "/" . uniqid("rdc");
    shell_exec(
        "tr -cd '\11\12\15\40-\176' < " .
        escapeshellarg($feed) . " > " .
        escapeshellarg($tempFile)
    );
    try {
        $simpleXMLobject = simplexml_load_file($tempFile);
    } catch (\Exception $ex) {
        $err = $ex->getTraceAsString();
        echo die($err);
    }
}


I don't see any error (either the actually used XML is different form the provided, or the xml processor used (BTW, what is it?) is buggy).

I would recommend to avoid using CDATA sections. Use the following XML document, which is the same as (text-equivalent to) the provided, and much more readable:

<quizzes>
   <quizz>
      <title>Title</title>
      <descr>Some text here!</descr>
      <tags>one tag, second tag</tags>
      <image>http://www.site.org/displayimage.php?album=search&amp;cat=0&amp;pos=1</image>
      <results>
         <result>
            <title>Something</title>
            <descr>Some text here</descr>
            <image>http://www.site.org/displayimage.php?album=search&amp;cat=0&amp;pos=17</image>
            <id>1</id>
         </result>
      </results>
   </quizz>
</quizzes>


I 've found that the problem was with passing this XML in PHP with cURL. I've sent it as normal text, and & char in this XML was interpreted as delimiter to next parameter. So when I escaped this char it started to work properly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜