Parsing xml containing character reference
The XML im trying to parse contains a control character 0x2 inside CDATA. I tried to replace it with character reference which led to CDATA looking like:
CDATA section----charcter reference----CDATA section
Now if i try to parse it i get an error message saying: org.xml.sax.SAXParseException: Content is not allo开发者_如何学JAVAwed in prolog.
The original xml looked like:
<?xml version="1.1" encoding="UTF-16"?><CELL><![CDATA[ABCDEFGH]]></CELL>
I modified it to:
<?xml version="1.1" encoding="UTF-16"?><CELL><![CDATA[ABCD]]><![CDATA[EFGH]]></CELL>
Entity definitions are not resolved in CDATA sections, that is why your original example does not work. That the modified example does not work seems to be a SAX parser error in my opinion. Maybe the SAX parser does not allow an invisible byte order mark (BOM) before the XML prolog that starts with <?, but the SAX parser should.
To help the SAX parser the following workaround would eventually do. Namely consuming the BOM before you feed the parser. You could use a markable stream for this purpose, i.e. marking the stream, reading the BOM, reseting the stream to its mark if there was no BOM. I didn't try, its just a guess.
BTW: Your question would be perceived better if you would fix the typo in the intro: Write "character reference" instead of "charcter reference". I first thought that the missing a is related to your question.
精彩评论