开发者

Extracting data form XML file with SimpleXML in PHP

Introduction:

I want to loop through XML files with flexible categories structure.

Problem:

I don't know to loop through a theoretical infinte subcategories without having to make x amount of "for each" statements (See coding example in the bottom). How do I dynamically traverse the categories structure?

<?xml version="1.0" encoding="utf-8"?>
<catalog>
    <category name="Category - level 1">
        <category name="Category - level 2" />
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
        <category name="Category - 开发者_如何转开发level 2">
            <category name="Category - level 3">
                <category name="Category - level 4" />
            </category>
        </category>
    </category>
</catalog>

What I have now:

I have no problem looping through XML files with a set structure:

<catalog>
    <category name="Category - level 1">
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
    </category>
</catalog>

Coding example:

//$xml holds the XML file
foreach ( $xml AS $category_level1 )
{
    echo $category_level1['name'];

    foreach ( $category_level1->category AS $category_level2 )
    {
        echo $category_level2['name'];

        foreach ( $category_level2->category AS $category_level3 )
        {
           echo $category_level3['name'];
        }
    }
}


Getting the name attributes from your categories is likely fastest when done via XPath, e.g.

$categoryNames = $doc->xpath('//category/@name');

However, if you want to recursively iterate over an arbitrary nested XML structure, you can also use the SimpleXMLIterator, e.g. with $xml being the string you gave:

$sxi = new RecursiveIteratorIterator(
           new SimpleXMLIterator($xml), 
           RecursiveIteratorIterator::SELF_FIRST);

foreach($sxi as $node) {
    echo str_repeat("\t", $sxi->getDepth()), // indenting
         $node['name'],                      // getting attribute name
         PHP_EOL;                            // line break
}

will give

Category - level 1
    Category - level 2
    Category - level 2
        Category - level 3
    Category - level 2
        Category - level 3
            Category - level 4

Like said in the beginning, when just wanting to get all name attributes, use XPath, because iterating over each and every node is slow. Use this approach only when you want to do more complex things with the nodes, for instance adding something to them.


<?php
$xml= new SimpleXMLElement('.....');
foreach ($xml->xpath('//category') as $cat)
{
    echo $cat['name'];
}


A possible solution could be to write a recursive function, that would :

  • Foreach category of the current depth
    • write the name of the current category
    • If it has any child catagories, call itself over those.

An advantage of such a solution is that you can keep track of the current depth you are, in your XML document -- can be useful if you need to represent your data as a tree, for instance.


For example, if you have your XML loaded like this :

$string = <<<XML
<catalog>
    <category name="Category - level 1">
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
    </category>
</catalog>
XML;

$xml = simplexml_load_string($string);


You could call the recursive function like this :

recurse_category($xml);


And that function could be written this way :

function recurse_category($categories, $depth = 0) {
    foreach ($categories as $category) {
        echo str_repeat('&nbsp; ', 2*$depth);
        echo (string)$category['name'];
        echo '<br />';

        if ($category->category) {
            recurse_category($category->category, $depth + 1);
        }
    }
}


Finally, running this code would give your this kind of output :

Category - level 1
    Category - level 2
        Category - level 3
    Category - level 2
        Category - level 3


Using simplexml and xpath as fine
...but just as a sidenote, if all you want to achieve is to get the name attribute of each and every <category> element in the document DOMDocument::getElementsByTagName() would suffice.
You can switch between DOM and simplexml via dom_import_simplexml() and simplexml_import_dom(). Both use the same internal representation of the data, so there's no costly conversion involved.

$xml = '<?xml version="1.0" encoding="utf-8"?>
<catalog>
    <category name="Category - level 1">
        <category name="Category - level 2" />
        <category name="Category - level 2">
            <category name="Category - level 3" />
        </category>
        <category name="Category - level 2">
            <category name="Category - level 3">
                <category name="Category - level 4" />
            </category>
        </category>
    </category>
</catalog>';

$doc = new DOMDocument;
$doc->loadxml($xml);

foreach( $doc->getElementsByTagName('category') as $c) {
  echo $c->getAttribute('name'), "\n";
}

prints

Category - level 1
Category - level 2
Category - level 2
Category - level 3
Category - level 2
Category - level 3
Category - level 4
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜