开发者

Java XML Unmarshalling fails on ampersand (&) using JAXB

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<details>
  ...
  <address1>Test&amp;Address</address1>
  ...
</details>

When I try to unmarshal it using JAXB, it throws the following exception:

Caused by: org.xml.sax.SAXParseException: The reference to entity "Address" must end with the ';' delimiter.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

But when I changed the &amp; in the XML to &apos;, it works. Looks like the problem is only with ampersand &amp; and I cannot understand why.

The code to unmarshal is:

JAXBContext context = JAXBContext.newInstance("some.package.name", this.getClass().getClassLoader());
Unmarshaller unmarshaller = context.createUnmarshaller();
obj = unmarshaller.unmarshal(new StringReader(xml));

Anyone have some insight?

EDIT: I tried the solution suggested by @abhin4v below (ie, add a space after &amp;), but it doesn't seem to work too. Here's the stacktrace:

Caused by: org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFr开发者_StackOverflow中文版agmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)


I've run into this too. First pass I simply replaced the &amp to a token string (AMPERSAND_TOKEN), sent it through JAXB, then re-replaced the ampersand. Not ideal, but it was a quick fix.

Second pass I made a lot of significant changes, so I'm not sure what exactly solved the problem. I suspect that providing JAXB access to the html dtds made it much happier, but that's only a guess and could be specific to my project.

HTH


Xerces converts &amp; to & and then tries to resolve &Address which fails because it does not end with ;. Put a space between & and Address and it should work. Putting a space will not work as Xerces will now try to resolve & and throw the second error given in OP. You can wrap the test in a CDATA section and Xerces will not try to resolve the entities.


It turns out that the problem is because of the framework I'm using (Mentawai framework). The said XML comes from the POST body of an HTTP request.

Apparently, the framework converts the character entities in the XML body, therefore, &amp; becomes & and the unmarshaller fails to unmarshal the XML.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜