Loop over DOMDocument
I am following the suggestion from this question Robust, Mature HTML Parser for PHP, about parsing html that may be malformed with DOMDocument.
Is there any easy way to loop over the parsed document? So I would like to loop over html like this.
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
???
foreach (??? as $node)
{
print $node->nodeName.':'.$node->nodeValue;
}
And 开发者_Go百科get results somewhat like this.
ul:
li:value1
li:value2
li:value3
p:subvalue
p:hello world
Using $doc->childNodes
by itself doesn't really do what I want. Since it doesn't seem to go down to lower branches in the tree. I used the code suggested by halfdan and I get results like this.
html:
html:value1
value1
value3
subvalue
hello world
Try this:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
function showDOMNode(DOMNode $domNode) {
foreach ($domNode->childNodes as $node)
{
print $node->nodeName.':'.$node->nodeValue;
if($node->hasChildNodes()) {
showDOMNode($node);
}
}
}
I was having issues with elements that had c data, where even elements that didn't have children where returning that they did.
I am not sure why it was.
The work around I found was to change
if($node->hasChildNodes()) {
showDOMNode($node);
}
to
if($node->childNodes->length != 1) {
showDOMNode($node);
}
And the code now works perfectly.
You need to use PHP Simple HTML DOM Parser and the following code:
<?php
require_once 'simplehtmldom/simple_html_dom.php';
function iterateHtmlElements($html)
{
$dom = str_get_html($html);
$dom->set_callback('handleElement');
$dom->__toString();
echo "\n";
}
function handleElement(simple_html_dom_node $elem)
{
if($elem->tag == 'text') {
echo $elem->innertext();
}
else {
echo "\n" . $elem->tag . ": ";
}
}
$html='<ul>
<li>value1</li>
<li>value1</li>
<li>value3
<p>subvalue</p>
</li>
</ul>
<p>hello world</p>';
iterateHtmlElements($html);
It works exactly as expected. I checked it with the input you provided and got the following results:
> php test2.php
ul:
li: value1
li: value1
li: value3
p: subvalue
p: hello world
One way is to walk the tree as follow:
function next_node($node)
{
if($node->firstChild != null)
{
return $node->firstChild;
}
if($node->nextSibling != null)
{
return $node->nextSibling;
}
for($node = $node->parentNode; $node != null; $node = $node->parentNode)
{
if($node->nextSibling != null)
{
return $node->nextSibling;
}
}
return null;
}
for($node = $doc; $node != null; $node = next_node($node))
{
// handle node (read-only mode, if you need read-write
// you have to save all the nodes in an array and then
// use that array
//
...
}
This works for most documents, however it looks like at times the parentNode
is somehow not correctly set and the next_node()
function ends up returning the wrong information.
精彩评论