XPath string for all children of namespaced elements
Just getting started with XPath, and using it's implementation with PHP's SimpleXML
objects. Right now I'm using //zuq:*
to create an array of SimpleXML
objects with the zuq
prefix in a given document. However, I'd like the SimpleXML
objects to reference all descendants regardless of namespace. I tried using //child::zuq:*
, but the SimpleXML
trees it creates don't seem to be complete.
Essentially, the objects captured should be all the top level objects of the zuq
namespace throughout the document, containing all descendant elements regardless of namespace, including zuq
.
tl;dr: How can I create a SimpleXML
object tree from a given document where each SimpleXML
root object is the highest level document element of a given namespace (such as zuq
) containing all descendants of said element regardless of the descendant namespace? XPath is not a requisite but appears to be the best choice based on my reading.
test.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:zuq="http://localhost/zuq">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<h1>Heading</h1>
<p>Paragraph</p>
<zuq:region name="myRegion">
<div class="myClass">
<h1><zuq:data name="myDataHeading" /></h1>
<p>
<zuq:data name="myDataParagraph">
<zuq:format type="trim">
<zuq:param name="length" value="200" />
<zuq:param name="append">
<span class="paragraphTrimOverflow">...</span>
</zuq:param>
</zuq:format>
</zuq:data>
</p>
</div>
</zuq:region>
</body>
</html>
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
print_r($sxml_zuq);
Produces:
Array
(
[0] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object //I don't know why these don't contain their zuq descendants
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[1] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
[2] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[3] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[4] => SimpleXMLElement Object
(
)
[5] => SimpleXMLElement Object
(
[@attributes] => Array
(
开发者_运维问答 [name] => myDataHeading
)
)
[6] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[7] => SimpleXMLElement Object
(
)
[8] => SimpleXMLElement Object
(
)
[9] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[10] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[11] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[12] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[13] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => length
[value] => 200
)
)
[14] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[15] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[16] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[17] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[18] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => paragraphTrimOverflow
)
[0] => ...
)
[19] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => append
)
[span] => ...
)
[20] => SimpleXMLElement Object
(
[@attributes] => Array
(
[type] => trim
)
)
[21] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myDataParagraph
)
)
[22] => SimpleXMLElement Object
(
)
[23] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
[24] => SimpleXMLElement Object
(
[@attributes] => Array
(
[name] => myRegion
)
[div] => SimpleXMLElement Object
(
[@attributes] => Array
(
[class] => myClass
)
[h1] => SimpleXMLElement Object
(
)
[p] => SimpleXMLElement Object
(
)
)
)
)
Don't trust the output of the print_r statement ... it seems to be showing an empty object, but in my testing the children are actually still there. For example, starting with your code above:
$sxml = simplexml_load_file('test.html');
$sxml_zuq = $sxml->xpath('//zuq:*/descendant-or-self::node()');
If I subsequently try a command like this:
print_r($sxml_zuq[0]->div->h1);
I get this output:
SimpleXMLElement Object
(
)
It seems to be empty, right? But if I modify the command to look like this:
echo $sxml_zuq[0]->div->h1->asXML();
I get the resultant tree with the namespaced child:
<h1><zuq:data name="myDataHeading"/></h1>
I'm not 100% sure why this is; it probably has something to do with the print_r statement trying to flatten the simplexml object and not dealing with the namespaces properly. But when you keep to the simplexml objects themselves that are returned from your xpath call, all of the children are preserved.
Now, in regards to your xpath itself, you probably DON'T want the "descendant-or-self" axis, because that will match not only the top-level zuq element, but also match all its children and create a larger array than you're actually seeking to return (unless I'm misunderstanding what you're asking). If you try something like this:
$sxml_zuq = $sxml->xpath('//zuq:*[not(ancestor::zuq:*)]');
then you'll get back an array of ONLY the top level of zuq namespaced elements. (while your example XML only had one such top-level element, your actual data may have several siblings at that level). You can then capture the content of each of these top level elements like this:
foreach ($sxml_zuq as $zuq_node) {
echo ($zuq_node->asXML());
}
Things get a little trickier if you want to repeat this process but do the search for top-level (or any) elements in the default namespace; you'd have to use the registerNamespace function to give the default namespace a temporary prefix, and do the xpath search on that.
I think you're looking for //zuq:*/descendant-or-self::*
. This will result in all subtrees with the root having zuq
namespace prefix.
The observed behavior seems to be an artifact of SimpleXML (the XPath specification does not deal with trees in the XPath query output, only separate nodes). You can probably solve it using something like
//zuq:*[not(ancestor::zuq:*)]/descendant-or-self::*
ancestor[...] checks whether there is an ancestor for which a condition is true - i.e. whether there is an ancestor with zuq prefix. So you should get only zuq: roots that have no zuq: ancestor.
精彩评论