Create NodeList of all Document nodes manually

2023-03-26 15:20 问答作者：

I currently generate a NodeList of all the Document nodes (in document order) manually. The XPath expression to get this NodeList is

//. | //@* | //namespace::*

My first attempt for walking the DOM manually and collecting the nodes (NodeSet is a primitive NodeList implementation delegating to a List):

private static void walkRecursive(Node cur, NodeSet nodes) {
    nodes.add(cur);

    if (cur.hasAttributes()) {
        NamedNodeMap attrs = cur.getAttributes();
        for (int i=0; i < attrs.getLength(); i++) {
            Node child = attrs.item(i);
            walkRecursive(child, nodes);
        }
    }

    int type = cur.getNodeType();
    if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
        NodeList children = cur.getChildNodes();
        if (children == null)
            return;

        for (int i=0; i < children.getLength(); i++) {
            Node child = children.item(i);
            walkRecursive(child, list);
        }
    }
}

I would start the recursion with calling walkRecursive(doc, nodes) where doc is the org.w3c.Document and nodes a (yet empty) NodeSet.

I tested this using this primitive XML document:

<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

If I for example canonicalize my manually created NodeSet and the NodeList generated by the initially mentioned XPath expression and compare the two byte for byte, then the result is equal and seems to work just fine.

But, if I iterate over the two NodeLists and print debug info (typeString simply generates a string representation)

for (int i=0; i < nodes.getLength(); i++) {
    Node child = nodes.item(i);
    System.out.println("Type: " + typeString(child.getNodeType()) +
                       " Name:" + child.getNodeName() + 
                       " Local name: " + child.getLocalName() +
                       " NS: " + child.getNamespaceURI());
}

then I receive this output for the XPath-generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

and this for the manually generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

So, as you can see, in the first example the NodeList additionally contains the Node for the XML namespace:

Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/

Now my questions:

a) If I interpret xml-names11 correctly, then I don't need the xmlns:xml declaration:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Am I correct? (at least c) hints in that direction)

b) But then, why does the XPath evaluation add it anyway - shouldn't it just include what was there in the first place instead of automagically adding things?

c) This can cause trouble with XML canonicalization, although it shouldn't - declarations of the xml namespace should be omitted during canonicalization. Does anyone know of (Java) implementations that get this wrong?

Edit:

Here's the code I used to evaluate the XPath expression that contained the 'xml' namespace node:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
    Document doc = 开发者_StackOverflow中文版dbf.newDocumentBuilder().parse(in);
    XPathFactory fac = XPathFactory.newInstance();
    XPath xp = fac.newXPath();
    XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
    NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
    in.close();
}

Since you can write

<myns:root xml:space="preserve" xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

without declaring the "xml" prefix, then it must be there implicitly. It is therefore correct to include the namespace node for this namespace declaration in the //namespace:* location step

So,

a) you are wrong, you need it (well, depending on the purpose of your code)

b) see above

c) no, but I've seen other namespace corner cases where things went haywire (e.g. Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature

继续阅读：canonicalization dom xml

Create NodeList of all Document nodes manually

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？