开发者

How to read an XML file with an undefined namespace with XMLReader?

I'm relatively new to parsing XML files and am attempting to read a large XML file with XMLReader.

<?xml version="1.0" encoding="UTF-8"?>
&l开发者_Python百科t;ShowVehicleRemarketing environment="Production" lang="en-CA" release="8.1-Lite" xsi:schemaLocation="http://www.starstandards.org/STAR /STAR/Rev4.2.4/BODs/Standalone/ShowVehicleRemarketing.xsd">
  <ApplicationArea>
    <Sender>
      <Component>Component</Component>
      <Task>Task</Task>
      <ReferenceId>w5/cron</ReferenceId>
      <CreatorNameCode>CreatorNameCode</CreatorNameCode>
      <SenderNameCode>SenderNameCode</SenderNameCode>
      <SenderURI>http://www.example.com</SenderURI>
      <Language>en-CA</Language>
      <ServiceId>ServiceId</ServiceId>
    </Sender>
    <CreationDateTime>CreationDateTime</CreationDateTime>
    <Destination>
      <DestinationNameCode>example</DestinationNameCode>
    </Destination>
  </ApplicationArea>
...

I am recieving the following error

ErrorException [ Warning ]: XMLReader::read() [xmlreader.read]: compress.zlib://D:/WebDev/example/local/public/../upload/example.xml.gz:2: namespace error : Namespace prefix xsi for schemaLocation on ShowVehicleRemarketing is not defined

I've searched around and can't find much useful information on using XMLReader to read XML files with namespaces -- How would I go about defining a namespace, if that is in fact what I need to do.. little help? links to pertinent resources?


There needs to be a definition of the xsi namespace. E.g.

<ShowVehicleRemarketing
  environment="Production"
  lang="en-CA"
  release="8.1-Lite"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.starstandards.org/STAR/STAR/Rev4.2.4/BODs/Standalone/ShowVehicleRemarketing.xsd"
>

Update: You could write a user defined filter and then let the XMLReader use that filter, something like:

stream_filter_register('darn', 'DarnFilter');
$src = 'php://filter/read=darn/resource=compress.zlib://something.xml.gz';
$reader->open($src);

The contents read by the compress.zlib wrapper is then "routed" through the DarnFilter which has to find the (first) location where it can insert the xmlns:xsi declaration. But this is quite messy and will take some afford to do it right (e.g. theoretically bucket A could contain xs, bucket B i:schem and bucket C aLocation=")


Update 2: here's an ad-hoc example of a filter in php that inserts the xsi namespace declaration. Mostly untested (worked with the one test I ran ;-) ) and undocumented. Take it as a proof-of-concept not production-code.

<?php
stream_filter_register('darn', 'DarnFilter');
$src = 'php://filter/read=darn/resource=compress.zlib://d:/test.xml.gz';

$r = new XMLReader;
$r->open($src);
while($r->read()) {
  echo '.';
}

class DarnFilter extends php_user_filter {
  protected $buffer='';
  protected $status = PSFS_FEED_ME;

  public function filter($in, $out, &$consumed, $closing)
  {
    while ( $bucket = stream_bucket_make_writeable($in) ) {
      $consumed += $bucket->datalen;
      if ( PSFS_PASS_ON == $this->status ) {
        // we're already done, just copy the content
        stream_bucket_append($out, $bucket);
      }
      else {
        $this->buffer .= $bucket->data;
        if ( $this->foo() ) {
          // first element found
          // send the current buffer          
          $bucket->data = $this->buffer;
          $bucket->datalen = strlen($bucket->data);
          stream_bucket_append($out, $bucket);
          $this->buffer = null;
          // no need for further processing
          $this->status = PSFS_PASS_ON;
        }
      }
    }
    return $this->status;
  }

  /* looks for the first (root) element in $this->buffer
  *  if it doesn't contain a xsi namespace decl inserts it
  */
  protected function foo() {
    $rc = false;
    if ( preg_match('!<([^?>\s]+)\s?([^>]*)>!', $this->buffer, $m, PREG_OFFSET_CAPTURE) ) {
      $rc = true;
      if ( false===strpos($m[2][0], 'xmlns:xsi') ) {
        echo ' inserting xsi decl ';
        $in = '<'.$m[1][0]
          . ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '
          . $m[2][0] . '>';    
        $this->buffer = substr($this->buffer, 0, $m[0][1])
          . $in
          . substr($this->buffer, $m[0][1] + strlen($m[0][0]));
      }
    }
    return $rc;
  }
}

Update 3: And here's an ad-hoc solution written in C#

XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
// prime the XMLReader with the xsi namespace
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");

using ( XmlReader reader = XmlTextReader.Create(
  new GZipStream(new FileStream(@"\test.xml.gz", FileMode.Open, FileAccess.Read), CompressionMode.Decompress),
  new XmlReaderSettings(),
  new XmlParserContext(null, nsmgr, null, XmlSpace.None)
)) {
  while (reader.Read())
  {
    System.Console.Write('.');
  }
}


You can file_get_contents and str_replace the XML before passing it to XMLReader.

Either insert the required namespace declararation for the xsi prefix:

$reader = new XMLReader;
$reader->xml(str_replace(
    '<ShowVehicleRemarketing',
    '<ShowVehicleRemarketing xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"',
    file_get_contents('http://example.com/data.xml')));

Another option would be to remove the schemaLocation attribute:

$reader->xml(str_replace(
    'xsi:schemaLocation="http://www.starstandards.org/STAR /STAR/Rev4.2.4/BODs/Standalone/ShowVehicleRemarketing.xsd"',
    '',
    file_get_contents('http://example.com/data.xml')));

However, if there is more prefixes in the document, you will have to replace all of them.


Either fix whatever's writing out malformed XML, or write a separate tool to perform the fix later. (It doesn't have to read it all into memory at the same time, necessarily - stream the data in/out, perhaps reading and writing a line at a time.)

That way your reading code doesn't need to worry about trying to do something useful with the data and fixing it up at the same time.


The xsi namespace is normally reserved for use with Schema Instance Namespace:

xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'

if it isn't, your XML file is not XML+NS compliant and cannot be parsed. So you should solve that in the source document.

A note on xsi: it is even more vital than some possible other namespaces, because it directs a validating parser to the correct schema locations for the schema of your XML.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜