Parsing with SAX and handling character entities
I am parsing a MathML expression with SAX (although the fact that it's MathML may not be completely relevant). An example input string is
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow>
<mo>λ</mo>
</mrow>
</math>
In order for the SAX parser to accept this string, I expand it a bit:
<?xml version="1.0"?>
<!DOCTYPE doc_type [
<!ENTITY nbsp " ">
<!ENTITY amp "&">
]>
<body>
<math xmlns='开发者_如何学Gohttp://www.w3.org/1998/Math/MathML'>
<mrow>
<mo>λ</mo>
<mrow>
</math>
</body>
Now, when I run the SAX parser on this, I get an exception:
[Fatal Error] :5:86: The entity "lambda" was referenced, but not declared.
org.xml.sax.SAXParseException: The entity "lambda" was referenced, but not
declared.
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
However, I know how to fix that. I simply add this line to the string being parsed:
<!ENTITY lambda "Λ">
This gives me
<?xml version="1.0"?>
<!DOCTYPE doc_type [
<!ENTITY nbsp " ">
<!ENTITY amp "&">
<!ENTITY lambda "Λ">
]>
<body>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow>
<mo>λ</mo>
<mrow>
</math>
</body>
Now, it parses just fine, thank you.
However, the problem is that I can't add an ENTITY declaration for every possible character entity that might be used in MathML (for example, "part", "notin", and "sum").
How do I rewrite this string so that it can be parsed for any possible character entity that might be included?
Use a DOCTYPE declaration that refers to the MathML DTD:
<!DOCTYPE math
PUBLIC "-//W3C//DTD MathML 3.0//EN"
"http://www.w3.org/Math/DTD/mathml3/mathml3.dtd">
or a local copy of the same.
精彩评论