DOMNode replacement with PHP's DOM classes
I'm learning to work with the DOM* classes available in PHP, and have noticed (what I think is) an irregularity in my testing.
Given this document, ZuqML_test_100.html
:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://localhost/~/zuqml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>
<zuq:data name="siteHeader" />
</h1>
<h2>
<zuq:data name="pageHeader开发者_开发百科Name" />
<span> | </span>
<zuq:data name="pageHeaderTitle" />
</h2>
<zuq:region name="post">
<zuq:param name="onEmpty">
<div class="post noposts">
<p>There are no posts to show at this time.</p>
</div>
</zuq:param>
<div class="post">
<h3><zuq:data name="postHeader" /></h3>
<p>
<zuq:data name="postText">
<zuq:format type="trim">
<zuq:param name="length">300</zuq:param>
<zuq:param name="append">
<a>
<zuq:attr name="href">
./?action=viewpost&id=<zuq:data name="postId" />
</zuq:attr>
<zuq:data name="postAuthor" />
</a>
</zuq:param>
</zuq:format>
</zuq:data>
</p>
</div>
</zuq:region>
</body>
</html>
I'm trying to replace all <zuq:data />
nodes with a simple text node with the value foo
. I'm doing so with the following snippet:
$root = new DOMDocument();
@$root->load('ZuqML_test_100.html');
foreach($root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data') as $node){
$node->parentNode->replaceChild($node->ownerDocument->createTextNode('foo'), $node);
}
echo $root->saveXML();
It sort of works, however my output still appears to contain <zuq:data />
nodes, as shown here:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://ichorworkstudios.no-ip.org/~/zuqml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>
foo
</h1>
<h2>
<zuq:data name="pageHeaderName"></zuq:data>
<span>—</span>
foo
</h2>
<zuq:region name="post">
<zuq:param name="onEmpty">
<div class="post noposts">
<p>There are no posts to show at this time.</p>
</div>
</zuq:param>
<div class="post">
<h3><zuq:data name="postHeader"></zuq:data></h3>
<p>
foo
</p>
</div>
</zuq:region>
</body>
</html>
Why is it that some <zuq:data />
nodes are left behind?
I think it has to do with how you're iterating. You're changing the result list as it's being iterated against, so it winds up breaking (side-effects). Try changing your loop to this:
$nodes = $root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data');
$i = $nodes->length - 1;
while ($i >= 0) {
$node = $nodes->item($i);
$node->parentNode->replaceChild(
$node->ownerDocument->createTextNode('foo'),
$node
);
$i--;
}
Basically, it just iterates backwards over the list of nodes, so that when nodes are removed, they are removed from the end rather than the beginning...
The explanation offered by ircmaxell that
you are changing the result list as it's being iterated against,
is correct, though I thought I add some more details to it so you can understand why that happens.
Here is what your code does when run
In the beginning there will be seven nodes in the NodeList.
The first one is
<zuq:data name="siteHeader"></zuq:data>
After that is removed the node count drops to six. The next node to be removed is
<zuq:data name="pageHeaderTitle"></zuq:data>
But if you look at your markup, you will see that the next zuq:data element would actually be
<zuq:data name="pageHeaderName" />
Now the problem is, when you remove a node from a document which is also currently in a NodeList that's currently being iterated, the node will also be removed from the NodeList. But the current position in the NodeList will still be the same (or automatically advance, not sure which way round), e.g.
0 siteHeader
1 pageHeaderName
2 pageHeaderTitle
n …
When the current position is at 0 and you remove that node from the document, you get a list like this
0 pageHeaderName
1 pageHeaderTitle
n …
You are still at position 0 though and thus, when you go to the next element in the NodeList you will have skipped the node at the new position 0. You go straight to pageHeaderTitle, leaving pageHeaderName unprocessed.
After pageHeaderTitle is removed, the node count drops to five, making
<zuq:data name="pageHeaderName"></zuq:data>
the new element at the current position. Consequently, the next node to be removed is
<zuq:data name="postText">
<zuq:format type="trim">
<zuq:param name="length">300</zuq:param>
<zuq:param name="append">
<a>
<zuq:attr name="href">
./?action=viewpost&id=
<zuq:data name="postId"></zuq:data>
</zuq:attr>
<zuq:data name="postAuthor"></zuq:data>
</a>
</zuq:param>
</zuq:format>
</zuq:data>
As you can see, there is two more zuq:data elements in there. Consequently, the node count will drop to 2 (5 - 1 current node - 2 children).
After that, the iteration over the NodeList ends, leaving you with
<zuq:data name="postHeader"></zuq:data>
and
<zuq:data name="pageHeaderName"></zuq:data>
still in the document.
精彩评论