How to replace text in an XML document using Java
How do I replace text in an XML document using Java?
Source:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a Home Owners Agreement is that...</p>
</body>
Desired output:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a 开发者_高级运维HOA is that...</p>
</body>
I only want text in <p>
tags to be replaced. I tried the following:
replaceText(string term, string replaceWith, org.w3c.dom.Node p){
p.setTextContent(p.getTextContent().replace(term, replaceWith));
}
The problem with the above code is that all the child nodes of p
get lost.
Okay, I figured out the solution.
The key to this is that you don't want to replace the text of the actual node. There is a actually a child representation of just the text. I was able to accomplish what I needed with this code:
private static void replace(Node root){
if (root.getNodeType() == root.TEXT_NODE){
root.setTextContent(root.getTextContent().replace("Home Owners Agreement", "HMO"));
}
for (int i = 0; i < root.getChildNodes().getLength(); i++){
outputTextOfNode(root.getChildNodes().item(i));
}
}
The problem here is that you actually want to replace node, not only the text. You can traverse the children of current node and add them again to the new node. Then replace nodes.
But it requires a lot of work and very sensitive to you document structure. For example if somebody will wrap your <p>
tag with div
you will have to change your parsing.
Moreover this approach is very ineffective from point of view of CPU and memory utilization: you have to parse whole document to change a couple of words in it.
My suggestion is the following: try to use regular expressions. In most cases it is strong enough. For example code like
xml.replaceFirst("(<p>.*?</p>)", "<p>The <b>good</b> thing about a HOA is that...</p>")
will work (at least in your case).
精彩评论