开发者

Get a node's inner XML as String in Java DOM

I have an XML org.w3c.dom.Node that looks like this:

<variable name="variableName">
    <br /><strong>foo</strong> bar
</variable>

How do I get the <br /><strong>开发者_StackOverflowfoo</strong> bar part as a String?


Same problem. To solve it I wrote this helper function:

public String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}


There is no simple method on org.w3c.dom.Node for this. getTextContent() gives the text of each child node concatenated together. getNodeValue() will give you the text of the current node if it is an Attribute,CDATA or Text node. So you would need to serialize the node using a combination of getChildNodes(), getNodeName() and getNodeValue() to build the string.

You can also do it with one of the various XML serialization libraries that exist. There is XStream or even JAXB. This is discussed here: XML serialization in Java?


If you're using jOOX, you can wrap your node in a jquery-like syntax and just call toString() on it:

$(node).toString();

It uses an identity-transformer internally, like this:

ByteArrayOutputStream out = new ByteArrayOutputStream();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
Source source = new DOMSource(element);
Result target = new StreamResult(out);
transformer.transform(source, target);
return out.toString();


Extending on Andrey M's answer, I had to slightly modify the code to get the complete DOM document. If you just use the

 NodeList childNodes = node.getChildNodes();

It didn't include the root element for me. To include the root element (and get the complete .xml document) I used:

 public String innerXml(Node node) {
     DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
     LSSerializer lsSerializer = lsImpl.createLSSerializer();
     lsSerializer.getDomConfig().setParameter("xml-declaration", false);
     StringBuilder sb = new StringBuilder();
     sb.append(lsSerializer.writeToString(node));
     return sb.toString(); 
 }


If you dont want to resort to external libraries, the following solution might come in handy. If you have a node <parent><child name="Nina"/></parent> and you want to extract the children of the parent element proceed as follows:

    StringBuilder resultBuilder = new StringBuilder();
    // Get all children of the given parent node
    NodeList children = parent.getChildNodes();
    try {

        // Set up the output transformer
        TransformerFactory transfac = TransformerFactory.newInstance();
        Transformer trans = transfac.newTransformer();
        trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        trans.setOutputProperty(OutputKeys.INDENT, "yes");
        StringWriter stringWriter = new StringWriter();
        StreamResult streamResult = new StreamResult(stringWriter);

        for (int index = 0; index < children.getLength(); index++) {
            Node child = children.item(index);

            // Print the DOM node
            DOMSource source = new DOMSource(child);
            trans.transform(source, streamResult);
            // Append child to end result
            resultBuilder.append(stringWriter.toString());
        }
    } catch (TransformerException e) {
        //Error handling goes here
    }
    return resultBuilder.toString();


I had the problem with the last answer that method 'nodeToStream()' is undefined; therefore, my version here:

    public static String toString(Node node){
    String xmlString = "";
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Source source = new DOMSource(node);

        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        transformer.transform(source, result);
        xmlString = sw.toString ();

    } catch (Exception ex) {
        ex.printStackTrace ();
    }

    return xmlString;
}


I want to extend the very good answer from Andrey M.:

It can happen that a node is not serializeable and this results in the following exception on some implementations:

org.w3c.dom.ls.LSException: unable-to-serialize-node: 
            unable-to-serialize-node: The node could not be serialized.

I had this issue with the implementation "org.apache.xml.serialize.DOMSerializerImpl.writeToString(DOMSerializerImpl)" running on Wildfly 13.

To solve this issue I would suggest to change the code example from Andrey M. a little bit:

private static String innerXml(Node node) {
    DOMImplementationLS lsImpl = (DOMImplementationLS) node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false); 
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++) {
        Node innerNode = childNodes.item(i);
        if (innerNode!=null) {
            if (innerNode.hasChildNodes()) {
                sb.append(lsSerializer.writeToString(innerNode));
            } else {
                sb.append(innerNode.getNodeValue());
            }
        }
    }
    return sb.toString();
}

I also added the comment from Nyerguds. This works for me in wildfly 13.


The best solution so far, Andrey M's, needs a specific implementation which can cause issues in the future. Here is the same approach but with just whatever JDK gives you to do the serialization (that means, what is configured to be used).

public static String innerXml(Node node) throws Exception
{
        StringWriter writer = new StringWriter();
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

        NodeList childNodes = node.getFirstChild().getChildNodes();
        for (int i = 0; i < childNodes.getLength(); i++) {
            transformer.transform(new DOMSource(childNodes.item(i)), new StreamResult(writer));
        }
        return writer.toString();
}

If you're processing a document rather than a node, you must go one level deep and use node.getFirstChild().getChildNodes(); But, to make it more robust, you should find the first Element, not just take it for granted that there is only one node. XML has to have a single root element, but can multiple nodes, including comments, entities and whitespace text.

        Node rootElement = docRootNode.getFirstChild();
        while (rootElement != null && rootElement.getNodeType() != Node.ELEMENT_NODE)
            rootElement = rootElement.getNextSibling();
        if (rootElement == null)
            throw new RuntimeException("No root element found in given document node.");

        NodeList childNodes = rootElement.getChildNodes();

And if I should recommend a library to deal with it, try JSoup, which is primarily for HTML, but works with XML too. I haven't tested that though.

Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
fileContents.put(Attributes.BODY, document.body().html());
// versus: document.body().outerHtml()


Here is an alternative solution to extract the content of a org.w3c.dom.Node. This solution works also if the node content contains no xml tags:

private static String innerXml(Node node) throws TransformerFactoryConfigurationError, TransformerException {
    StringWriter writer = new StringWriter();
    String xml = null;
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.transform(new DOMSource(node), new StreamResult(writer));
    // now remove the outer tag....
    xml = writer.toString();
    xml = xml.substring(xml.indexOf(">") + 1, xml.lastIndexOf("</"));
    return xml;
}


Building on top of Lukas Eder's solution, we can extract innerXml like in .NET as below

    public static String innerXml(Node node,String tag){
            String xmlstring = toString(node);
            xmlstring = xmlstring.replaceFirst("<[/]?"+tag+">","");
            return xmlstring;       
}

public static String toString(Node node){       
    String xmlString = "";
    Transformer transformer;
    try {
        transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        //transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        StreamResult result = new StreamResult(new StringWriter());

        xmlString = nodeToStream(node, transformer, result);

    } catch (TransformerConfigurationException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerFactoryConfigurationError e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (TransformerException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }catch (Exception ex){
        ex.printStackTrace();
    }

    return xmlString;               
}

Ex:

If Node name points to xml with string representation "<Name><em>Chris</em>tian<em>Bale</em></Name>" 
String innerXml = innerXml(name,"Name"); //returns "<em>Chris</em>tian<em>Bale</em>"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜