Parse XML and get a DOM tree without binding namespaces - Java
I have an XML-like file:
<p>something</p>
<ac:image>
<ri:attachment ri:filename="IMAGE.PNG" />
</ac:image>
<ac:macro ac:name="screenshot">
<ac:default-parameter>IMAGE.ss</ac:default-parameter>
</ac:macro>
<p>something</p>
I need to transform it with XSLT template - I want to replace all <ac:image>
with <ac:macro ac:name="screenshot">
. Generally, it's very easy to parse and transform well-formed and well-known XMLs. My case is rather different.
As you can see, it doesn't have a root element and XML prolog. But it's not a problem, I can add <?xml version="1.0"?>
and wrap the content with any arbitrary element such as <root>
to avoid an exception:
Caused by: org.jdom.input.JDOMParseException: Error on line 1: Content is not allowed in prolog.
The example XML contains three namespaces - default, ac
and ri
. Since the code is going to run on the customer-specified content, there may be some other namespaces that I am not aware about. I am not able to bind all namespaces before parsing the XML, so I encounter an exception:
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
I found somewhere on the Internet that SAX parser is able to parse XMLs in a mode where it doesn't resolve the namespaces. In default mode, you get namespace=ac
and element=macro
, whereas in non-namespace mode you get no namespace and element=ac:macro
. And this is desired. All you need is set SAX features on parser: namespaces=false
, namespace-prefixes=true
.
final XMLReader sax = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
sax.setFeature("http://xml.org/sax/features/validation", false);
sax.setFeature("http://xml.org/sax/features/namespaces", false);
sax.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
sax.parse(new InputSource(new StringReader(content))); // parse returns void
It doesn't throw any exception so it looks like XML is parsed without an error. However, I need a DOM tree so that I can transform it with XSLT. Let's use JDOM then:
// all classes are org.jdom.*
final SAXBuilder sax = new SAXBuilder(false); // validate=false
sax.setFeature("http://xml.org/sax/features/namespaces", false);
sax.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
final Document document = sax.build(new StringInputStream(content));
Unfortunately, I get an exception:
Caused by: org.jdom.IllegalNameException: The name "" is not legal for JDOM/XML elements: XML names cannot be null or empty.
at org.jdom.Element.setName(Element.java:206)
at org.jdom.Element.<init>(Element.java:140)
at org.jdom.Element.<init>(Element.java:152)
at org.jdom.DefaultJDOMFactory.element(DefaultJDOMFactory.java:138)
at org.jdom.input.SAXHandler.startElement(SAXHandler.java:511)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.dtd.XMLDTDValidator.startElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$ContentDispatcher.scanRootElementHook(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770)
at com.screensnipe.confluence.macro.XhtmlImageMacroReplacer.replaceImageMacroInText(XhtmlImageMacroReplacer.java:118)
JDOM complains about an illegal tag name <>
. Of course I don't have such. It looks like JDOM has got a bug in SAXHandler.java:511, element = factory.element(localName);
should be element = factory.el开发者_开发知识库ement(qName);
.
I also tried XOM. XOM does not work with "namespaces" feature set to false.
I also tried TagSoup library. I don't like it because it's messing the output XML. Adding an XML prolog and a root element is not a problem. Messing with the namespaces is.
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<p>something</p>
<ac:image xmlns:ac="urn:x-prefix:ac"> <!-- :( -->
<ri:attachment xmlns:ri="urn:x-prefix:ri" ri:filename="IMAGE.PNG" />
</ac:image>
...
The question is: How to get a DOM tree from my XML? (Java) Without writing my version of JDOM. I would appreciate a working solution. Just parse and get the DOM tree. A tree where namespaces are not broken as with TagSoup library.
Or more goal-focused question: how to replace <ac:image>
with <ac:macro ac:name="screenshot">
without touching other tags? (Java) All other tags, namespaces or whatever should be unaffected. (Don't suggest any regexps)
If you're willing to do pre-processing like adding a surrounding root element, you might as well also look through the XML file for namespace prefixes, and add dummy declarations for each of them to the root element you're adding.
Then you won't need a parser that can be told not to resolve namespace prefixes.
精彩评论