Dom4J preserve whitespace when writing file
I'm working on a program that is using Dom4J to write xml files. The database schema I am writing to has a handy xml validation and import schema. Dom4J is working great, but, I can't seem to figure out how to set the 'preserve' field in Dom4J's XMLWriter class. I have a particular element where I need the encoded '\n's preserved.
The javadoc for this class is a little underdocumented http://dom4j.sourceforge.net/dom4j-1.6.1/apidocs/org/dom4j/io/XMLWriter.html
I've tried playing around with on OutputFormat object but no d开发者_Python百科ice.
Can anyone tell me how to ensure that an XMLWriter object preserves's the whitespace of dom4j tree's elements when writing to file.
Thanks,
Donald
Say I am starting with:
Element accession = factory.createElement("title");
List<String> AUT = new ArrayList<String>;
AUT.add("author1");
AUT.add("author2");
String title = "Title";
I would like to have an output similar to:
<title>author1
author2
Title</title>
With the line returns encoded into the title field.
DefaultEntity e = new DefaultEntity("#10");
if(AUT.size() > 1) {
for(String a : AUT) {
accession.addText(a);
accession.add(e);
}
accession.addText(title);
}
This does not work as it is an IllegalAddException.
First of all, the "preserve" property has nothing to do with preserving the encoding of a previously encoded character--but rather with preserving the white space contained in an element. This property is usually controlled by the xml:space="preserve"
attribute.
However, if your use case is that you have an encoded newline in your input, that you want to be preserved in the output, you're in trouble. DOM4J will decode all entities and character references to their corresponding Java characters (UTF-16). This is partially controllable by configuring the underlying XMLreader, but as far as i know, no XMLReader will report the start and end of character references--these will silently be replaced by their corresponding character values.
On output, XMLWriter will encode only those characters that are required to be encoded, either because of the XML rules or because of the encoding used when serializeing (e.g. UTF-8 or ISO-8859-1, etc).
In this case, you have basically two options.
1) Sub class XMLWriter and completely replace the characters() method, since handling of white space is really intrinsic to this method. There is no other way that you can intercept the writing of tab, newline or carriage return. Here, you must somehow keep track of where you are and recognize that you're processing the correct newline character
2) Identify the new line character that you want to be "re-escaped" and replace that with a DefaultEntity("#10")
node while setting the resolveEntityRefs
property of XMLWriter
to false
. This option implies splitting an existing Text node in two and inserting the entity node in between.
Seems like option 2 involves less work, while still being cumbersome
UPDATE:
OK, seems that you cannot add the same identical entity twice. If you add a new entity instance every time it works. However, your case could be fixed by àdding xml:space="preserve"
to your element.
if (AUT.size() > 1) {
for (String a : AUT) {
accession.addText(a);
accession.addText("\n");
}
accession.addText(title);
}
and then
accession.addAttribute(QName.get("space", Namespace.XML_NAMESPACE),
"preserve");
In this case, your explicitly added line breaks should be preserved, irregardless of the output format used when writing to xml.
Sorry for the confusion.
精彩评论