How to locate error after MalformedByteSequenceException thrown by XML parser
I'm getting a MalformedByteSequenceException when parsing an XML file.
My app allows external customers to submit XML files. They can use any supported encoding but most specify ...encoding="UTF-8"...
at the top of the file as per the examples that were provided to them. But then some will use windows-1252 to encode their data which will cause a MalformedByteSequenceException for non-ascii characters.
I want to use the XML parser to identify the file encoding and decode the file so I don't want to have a preliminary step of testing the encoding or of converting the InputStream to a Reader. I feel that the XML parser should handle that step.
Even though I have declared a ValidationEventHandler, it is not called when a MalformedByteSequenceException.
Is there any way of getting the Unmarshaller to report the location in the file where the error occurs?
Here is my Java code:
InputStream input = ...
JAXBContext jc = JAXBContext.newInstance(MyClass.class.getPackage().getName());
Unmarshaller unmarshaller = jc.createUnmarshaller();
SchemaFactory sf = SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source source = new StreamSource(getClass().getResource("my.xsd").toExternalForm());
Schema schema = sf.newSchema(sources);
unmarshaller.setSchema(schema);
ValidationEventHandler handler = new MyValidationEventHandler();
unmarshaller.setEventHandler(handler);
MyClass myClass = (MyClass) unmarshaller.unmarshal(input);
and the resulting stack-trace
javax.xml.bind.UnmarshalException
- with linked exception:
[com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:202)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:173)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:137)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:184)
at (my code)
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanContent(XMLEntityScanner.java:916)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2788)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.j开发者_开发技巧ava:119)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:200)
... 51 more
I haven't tested but I would
- use a SAXSource (javax.xml.transform.sax.SAXSource) instead of a StreamSource
- associate to the SAXSource my own implementation of org.xml.sax.ErrorHandler (SAXSource.getXMLReader().setErrorHandler)
Like that I would get informed of SAXParseException in which there is the location of the parsing error.
精彩评论