Is there a way to parse XML via SAX/DOM with line numbers available per node
I already have written a DOM parser for a large XML document format that contains a number of items that can be used to automatically generate Java code. This is limited to small expressions that are then merged into a dynamically generated Java source file.
So far - so good. Everything works.
BUT - I wish to be able to embed the line number of the XML node where the Java code was included from (so that if the 开发者_运维技巧configuration contains uncompilable code, each method will have a pointer to the source XML document and the line number for ease of debugging). I don't require the line number at parse-time and I don't need to validate the XML Source Document and throw an error at a particular line number. I need to be able to access the line number for each node and attribute in my DOM or per SAX event.
Any suggestions on how I might be able to achieve this?
P.S. Also, I read the StAX has a method to obtain line number whilst parsing, but ideally I would like to achieve the same result with regular SAX/DOM processing in Java 4/5 rather than become a Java 6+ application or take on extra .jar files.
I know this thread is a little old (sorry), but it has taken me so long to crack this nut I had to share the solution with someone...
You only seem to be able to obtain the line numbers with SAX which doesn't build a DOM. The DOM parser does not give the line numbers, and neither does it let you near the SAX parser it is using. My solution is to do an empty XSLT transformation using a SAX source and a DOM result, but even then someone has done their best to hide this. See the code below.
I add the location information to each element as an attribute with my own namespace, so I can find elements using XPath and report where the data came from.
Hope this helps:
// The file to parse.
String systemId = "myxml.xml";
/*
* Create transformer SAX source that adds current element position to
* the element as attributes.
*/
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
LocationFilter locationFilter = new LocationFilter(xmlReader);
InputSource inputSource = new InputSource(new FileReader(systemId));
// Do this so that XPath function document() can take relative URI.
inputSource.setSystemId(systemId);
SAXSource saxSource = new SAXSource(locationFilter, inputSource);
/*
* Perform an empty transformation from SAX source to DOM result.
*/
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(saxSource, domResult);
Node root = domResult.getNode();
...
class LocationFilter extends XMLFilterImpl {
LocationFilter(XMLReader xmlReader) {
super(xmlReader);
}
private Locator locator = null;
@Override
public void setDocumentLocator(Locator locator) {
super.setDocumentLocator(locator);
this.locator = locator;
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
// Add extra attribute to elements to hold location
String location = locator.getSystemId() + ':' + locator.getLineNumber() + ':' + locator.getColumnNumber();
Attributes2Impl attrs = new Attributes2Impl(attributes);
attrs.addAttribute("http://myNamespace", "location", "myns:location", "CDATA", location);
super.startElement(uri, localName, qName, attrs);
}
}
I ran into this issue recently and I thought I'd share a ready made utility class for handling it. Works with Java 11, whereas some of Reg Whitton's code uses some now deprecated classes.
Mostly based on this article with a few tweaks. Notably, storing the line number as a the node's user data rather than setting it as an attribute.
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayDeque;
import java.util.Deque;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class XmlDom {
public static Document readXML(InputStream is, final String lineNumAttribName) throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch(ParserConfigurationException e){
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
final Deque<Element> elementStack = new ArrayDeque<>();
final StringBuilder textBuffer = new StringBuilder();
DefaultHandler handler = new DefaultHandler() {
private Locator locator;
@Override
public void setDocumentLocator(Locator locator) {
this.locator = locator; //Save the locator, so that it can be used later for line tracking when traversing nodes.
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
addTextIfNeeded();
Element el = doc.createElement(qName);
for(int i = 0;i < attributes.getLength(); i++)
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
el.setUserData(lineNumAttribName, String.valueOf(locator.getLineNumber()), null);
elementStack.push(el);
}
@Override
public void endElement(String uri, String localName, String qName){
addTextIfNeeded();
Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
@Override
public void characters (char ch[], int start, int length) throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
Element el = elementStack.peek();
Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
parser.parse(is, handler);
return doc;
}
}
Access the line number with
node.getUserData(lineNumAttribName);
精彩评论