Parsing XML file with preserving information about the line number
I am creating a tool that analyzes some XML
files (XHTML
files to be precise). The purpose of this tool is not only to validate the XML structure, but also to check the value of some attributes.
So I created my own org.xml.sax.helpers.DefaultHandler
to handle events during the XML parsing. One of my requirements is to have the information about the current line number. So I decided to add a org.xml.sax.helpers.LocatorImpl
to my own DefaultHandler
. This solves almost all my problems, except one regarding the XML attributes.
Let's take an example:
<rootNode>
<foo att1="val1"/>
<bar att2="val2"
answerToEverything="43"
att3="val3"/>
</rootNode>
One of my rules indicates that if the attribute answerToEverything
is defined on the node bar
, its value should not be different from 42
.
When encountering such XML, my tool should detect an error. As I want to give a precise error message to the user, such as:
Error in file "foo.xhtml", line #4: answerToEverything only allow "42" as value.
my parser must be able to keep the line number during the parsing, even for attributes. If we consider the following implementation for my own DefaultHandler
class:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start element <" + qName + ">" + x());
for (int i = 0; i < attributes.getLength(); i++) {
System.out.println("Att '" + attributes.getQName(i) + "' = '" + attributes.getValue(i) + "' at " + locator.getLineNumber() + ":" + locator.getColumnNumber());
}
}
then for the node >bar>
, it will display the following output:
Start element at 5:23
Att 'att2' = 'val2' at 5:23 Att 'answerToEverything' = '43' at 5:23 Att 'att3' = 'val3' at 5:23
As you can see, the line number is wrong because the parser will consider the whole node, including its attributes as one block.
Ideally, if the interface ContentHandler
开发者_运维技巧 would have defined the startAttribute
and startElementBeforeReadingAttributes
methods, I wouldn't have any problem here :o)
So my question is how can I solve my problem?
For information, I am using Java 6
ps: Maybe another title for this question could be Java SAX parsing with attributes parsing events, or something like that...
I think that only way to implement this is to create your own InputStream (or Reader) that counts lines and somehow communicates with your SAX handler. I have not tried to implement this myself but I believe it is possible. I wish you good luck and would be glad if you succeed to do this and post your results here.
Look for an open source XML editor, its parser might have this information.
Editors don't use the same kind of parser that an application that just uses xml for data would use. Editors need more information, like you say line numbers and I would also think information about whitespace characters. A parser for an editor should not lose any information about characters in the file. That is the way you can implement for example a format function or "select enclosing element" (Alt-Shift-Up in Eclipse).
In both XmlBeans and JAXB it is possible to preserve line number information. You could consider using one of these tools (it is easier in XmlBeans).
精彩评论