Problem - XML declaration allowed only at the start of the document
xml:19558: parser error : XML declaration allowed only at the start of the document
any solutions? i am using php XMLReader to parse a large XML file, but getting this error. i know the file is not well formatted but i think its not possible to go through the file and remove these extra declarations. so any idea, PLEASE HEL开发者_Go百科P
Make sure there isn't any white space before the first tag. Try this:
<?php
//Declarations
$file = "data.txt"; //The file to read from.
#Read the file
$fp = fopen($file, "r"); //Open the file
$data = ""; //Initialize variable to contain the file's content
while(!feof($fp)) //Loop through the file, read it till the end.
{
$data .= fgets($fp, 1024); //append next kb to data
}
fclose($fp); //Close file
#End read file
$split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string
foreach ($split as $sxml) //Loop through each xml string
{
//echo $sxml;
$reader = new XMLReader(); //Initialize the reader
$reader->xml($sxml) or die("File not found"); //open the current xml string
while($reader->read()) //Read it
{
switch($reader->nodeType)
{
case constant('XMLREADER::ELEMENT'): //Read element
if ($reader->name == 'record')
{
$dataa = $reader->readInnerXml(); //get contents for <record> tag.
echo $dataa; //Print it to screen.
}
break;
}
}
$reader->close(); //close reader
}
?>
Set the $file variable to the file you want. Note I don't know how well this will work for a 4gb file. Tell me if it doesn't.
EDIT: Here is another solution, it should work better with the larger file (parses as it is reading the file).
<?php
set_time_limit(0);
//Declarations
$file = "data.txt"; //The file to read from.
#Read the file
$fp = fopen($file, "r") or die("Couldn't Open"); //Open the file
$FoundXmlTagStep = 0;
$FoundEndXMLTagStep = 0;
$curXML = "";
$firstXMLTagRead = false;
while(!feof($fp)) //Loop through the file, read it till the end.
{
$data = fgets($fp, 2);
if ($FoundXmlTagStep==0 && $data == "<")
$FoundXmlTagStep=1;
else if ($FoundXmlTagStep==1 && $data == "x")
$FoundXmlTagStep=2;
else if ($FoundXmlTagStep==2 && $data == "m")
$FoundXmlTagStep=3;
else if ($FoundXmlTagStep==3 && $data == "l")
{
$FoundXmlTagStep=4;
$firstXMLTagRead = true;
}
else if ($FoundXmlTagStep!=4)
$FoundXmlTagStep=0;
if ($FoundXmlTagStep==4)
{
if ($firstXMLTagRead)
{
$firstXMLTagRead = false;
$curXML = "<xm";
}
$curXML .= $data;
//Start trying to match end of xml
if ($FoundEndXMLTagStep==0 && $data == "<")
$FoundEndXMLTagStep=1;
elseif ($FoundEndXMLTagStep==1 && $data == "/")
$FoundEndXMLTagStep=2;
elseif ($FoundEndXMLTagStep==2 && $data == "x")
$FoundEndXMLTagStep=3;
elseif ($FoundEndXMLTagStep==3 && $data == "m")
$FoundEndXMLTagStep=4;
elseif ($FoundEndXMLTagStep==4 && $data == "l")
$FoundEndXMLTagStep=5;
elseif ($FoundEndXMLTagStep==5 && $data == ">")
{
$FoundEndXMLTagStep=0;
$FoundXmlTagStep=0;
#finished Reading XML
ParseXML ($curXML);
}
elseif ($FoundEndXMLTagStep!=5)
$FoundEndXMLTagStep=0;
}
}
fclose($fp); //Close file
function ParseXML ($xml)
{
//echo $sxml;
$reader = new XMLReader(); //Initialize the reader
$reader->xml($xml) or die("File not found"); //open the current xml string
while($reader->read()) //Read it
{
switch($reader->nodeType)
{
case constant('XMLREADER::ELEMENT'): //Read element
if ($reader->name == 'record')
{
$dataa = $reader->readInnerXml(); //get contents for <record> tag.
echo $dataa; //Print it to screen.
}
break;
}
}
$reader->close(); //close reader
}
?>
Another possible cause to this problem is unicode file head. If your XML's encoding is UTF-8, the file content will always start with these 3 bytes "EF BB BF". These bytes may be interpreted incorrectly if one attempts to convert from byte array to string. The solution is to write byte array to file directly without reading getString from the byte array.
ASCII has no file head Unicode: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00
Just open the file in ultraedit and you can see these bytes.
If you have multiple XML declarations, you likely have a concatenation of many XML files, and also more than one root element. It's not clear how you would meaningfully parse them.
Try really hard to get the source of the XML to give you real XML first. If that doesn't work, see if you can do some preprocessing to fix the XML before you parse it.
Its a bug of php Storm If you using php storm , php storm Makes your code start from the second line (no matter what you do) ! So you should go to your host and edit your file by direct admin or cpanel editor and put your
<?xml version=“1.0” encoding=“UTF-8” ?>
Code at the first line, “hope it helps”
精彩评论