Java DOM messing with my XML headers and adding attributes on its own
I'm having an issue with a small program that I wrote. It does what I intended it to do (add/remove/modify attributes) very well - I'm super excited about that part. But when I output the file, my headers change and some elements have attributes added to them automatically.
Here's what I start with:
<!DOCTYPE TEI SYSTEM "teilite-ur.dtd">
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
...
<availability>
...
After transforming each 开发者_C百科element node to contain an additional attribute(name=test,value=working), here's what I end up with:
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" test="working">
<teiHeader test="working" type="text">
<fileDesc test="working">
...
<availability default="false" status="unknown" test="working">
...
So, short overview:
- !DOCTYPE line was removed
- xmlns:xsi... was added
- type="text", default="false", status="unknown" anchored="true" attributes are added automatically (there may be others, but those are the ones that popped out at me).
I read in here [http://stackoverflow.com/questions/2133395/remove-xml-declaration-from-the-generated-xml-document-using-java] how to prevent the XML declaration from being added to the top. But, I'm not sure how to disable the rest of the additions.
Thanks!
Here's some self-contained code that does basically what I want it to (little more customization in the real program, but that shouldn't be relevant) and the relevant IBM tutorial that I used to help build it:
package xml_attrib_test;
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class Main {
public static void main(String[] args) {
//Input
File whichFile = new File("C:\\Users\\mw2xx\\Desktop\\proceedings.vol1.xml");
DocumentBuilderFactory domFactory;
DocumentBuilder builder;
Document doc;
XPathFactory factory;
XPath xpath;
XPathExpression expr;
NodeList nodes;
try {
domFactory = DocumentBuilderFactory.newInstance();
domFactory.setSchema(null);
domFactory.setValidating(false);
domFactory.setNamespaceAware(true);
domFactory.setExpandEntityReferences(false);
builder = domFactory.newDocumentBuilder();
doc = builder.parse(whichFile);
factory = XPathFactory.newInstance();
xpath = factory.newXPath();
expr = xpath.compile("//*");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
} catch (Exception ex) {
System.out.println("Error in parser.");
return;
}
// Do Stuff With the XML Doc
String attributeTag = "test";
String attrValue = "working";
for (int j = 0; j < nodes.getLength(); j++) {
Node n = nodes.item(j);
if (n.getNodeType() == Node.ELEMENT_NODE) {
Element e = (Element) n;
e.setAttribute(attributeTag, attrValue);
} else if (n.getNodeType() == Node.ATTRIBUTE_NODE) {
Attr a = (Attr) n;
if (a.getName().equals(attributeTag)) {
a.setValue(attrValue);
}
}
}
// Output
TransformerFactory tFactory;
Transformer transformer;
DOMSource source;
File resultFile;
StreamResult result;
try {
tFactory = TransformerFactory.newInstance();
transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
source = new DOMSource(doc);
resultFile = new File("$$$$$.tmp");
result = new StreamResult(resultFile);
transformer.transform(source, result);
} catch (Exception ex) {
System.out.println("Error in transformer.");
return;
}
whichFile.delete();
resultFile.renameTo(whichFile);
System.out.println("Success!");
}
}
After a few more days of googling and searching stack overflow I found a similar question which provided the setting I needed.
domFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Java change and move non-standard XML file
Are you using the standard JDK DOM parser?
If yes, then I'm guessing that you're using a schema for validation, and that schema specifies default attribute values. This would explain:
- Removing the DOCTYPE, because it's not used with schema validation. You could try calling
setValidating(true)
, but then you'll probably need to add anEntityResolver
- Setting default values as described in the XSD, and inserting attributes that support schema validation.
The answer, if all you care about is updating the XML, is to avoid the schema. Or parse once with the schema to validate, then once again to update.
精彩评论