XPath, XML Namespaces and Java
I've spent the past day attempting to extract a one XML node out of the following document and am unable to grasp the nuances of XML开发者_如何学Python Namespaces to make it work.
The XML file is to large to post in total so here is the portion that concerns me:
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<XFDL xmlns="http://www.PureEdge.com/XFDL/6.5" xmlns:custom="http://www.PureEdge.com/XFDL/Custom" xmlns:designer="http://www.PureEdge.com/Designer/6.1" xmlns:pecs="http://www.PureEdge.com/PECustomerService" xmlns:xfdl="http://www.PureEdge.com/XFDL/6.5">
<globalpage sid="global">
<global sid="global">
<xmlmodel xmlns:xforms="http://www.w3.org/2003/xforms">
<instances>
<xforms:instance id="metadata">
<form_metadata>
<metadataver version="1.0"/>
<metadataverdate>
<date day="05" month="Jul" year="2005"/>
</metadataverdate>
<title>
<documentnbr number="2062" prefix.army="DA" scope="army" suffix=""/>
<longtitle>HAND RECEIPT/ANNEX NUMBER </longtitle>
</title>
The document continues and is well formed all the way down. I am attempting to extract the "number" attribute from the "documentnbr" tag (three from the bottom).
The code that I'm using to do this looks like this:
/***
* Locates the Document Number information in the file and returns the form number.
* @return File's self-declared number.
* @throws InvalidFormException Thrown when XPath cannot find the "documentnbr" element in the file.
*/
public String getFormNumber() throws InvalidFormException
{
try{
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new XFDLNamespaceContext());
Node result = (Node)xPath.evaluate(QUERY_FORM_NUMBER, doc, XPathConstants.NODE);
if(result != null) {
return result.getNodeValue();
} else {
throw new InvalidFormException("Unable to identify form.");
}
} catch (XPathExpressionException err) {
throw new InvalidFormException("Unable to find form number in file.");
}
}
Where QUERY_FORM_NUMBER is my XPath expression, and XFDLNamespaceContext implements NamespaceContext and looks like this:
public class XFDLNamespaceContext implements NamespaceContext {
@Override
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix");
else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("custom".equals(prefix))
return "http://www.PureEdge.com/XFDL/Custom";
else if ("designer".equals(prefix))
return "http://www.PureEdge.com/Designer/6.1";
else if ("pecs".equals(prefix))
return "http://www.PureEdge.com/PECustomerService";
else if ("xfdl".equals(prefix))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("xforms".equals(prefix))
return "http://www.w3.org/2003/xforms";
else
return XMLConstants.NULL_NS_URI;
}
@Override
public String getPrefix(String arg0) {
// TODO Auto-generated method stub
return null;
}
@Override
public Iterator getPrefixes(String arg0) {
// TODO Auto-generated method stub
return null;
}
}
I've tried many different XPath queries but I keep feeling like this should work:
protected static final String QUERY_FORM_NUMBER =
"/globalpage/global/xmlmodel/xforms:instances/instance" +
"/form_metadata/title/documentnbr[number]";
Unfortunately it does not work and I continually get a null return.
I've done a fair amount of reading here, here, and here, but nothing has proved sufficiently illuminating to help me get this working.
I'm almost positive that I'm going to face-palm when I figure this out but I'm really at wit's end as to what I'm missing.
Thank you for reading through all of this and thanks in advance for the help.
-Andy
Aha, I tried to debug your expression + got it to work. You missed a few things. This XPath expression should do it:
/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number
- You need to include the root element (XFDL in this case)
- I didn't end up needing to use any namespaces in the expression for some reason. Not sure why. If this is the case, then the NamespaceContext.getNamespaceURI() never gets called. If I replace
instance
withxforms:instance
then getNamespaceURI() gets called once withxforms
as the input argument, but the program throws an exception. - The syntax for attribute values is
@attr
, not[attr]
.
My complete sample code:
import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.XMLConstants;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
public class XPathNamespaceExample {
static public class MyNamespaceContext implements NamespaceContext {
final private Map<String, String> prefixMap;
MyNamespaceContext(Map<String, String> prefixMap)
{
if (prefixMap != null)
{
this.prefixMap = Collections.unmodifiableMap(new HashMap<String, String>(prefixMap));
}
else
{
this.prefixMap = Collections.emptyMap();
}
}
public String getPrefix(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public Iterator getPrefixes(String namespaceURI) {
// TODO Auto-generated method stub
return null;
}
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Invalid Namespace Prefix");
else if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("custom".equals(prefix))
return "http://www.PureEdge.com/XFDL/Custom";
else if ("designer".equals(prefix))
return "http://www.PureEdge.com/Designer/6.1";
else if ("pecs".equals(prefix))
return "http://www.PureEdge.com/PECustomerService";
else if ("xfdl".equals(prefix))
return "http://www.PureEdge.com/XFDL/6.5";
else if ("xforms".equals(prefix))
return "http://www.w3.org/2003/xforms";
else
return XMLConstants.NULL_NS_URI;
}
}
protected static final String QUERY_FORM_NUMBER =
"/XFDL/globalpage/global/xmlmodel/xforms:instances/instance" +
"/form_metadata/title/documentnbr[number]";
public static void main(String[] args) {
try
{
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.parse(new File(args[0]));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/@sid"));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/@id" ));
System.out.println(extractNodeValue(doc, "/XFDL/globalpage/global/xmlmodel/instances/instance/form_metadata/title/documentnbr/@number" ));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
private static String extractNodeValue(Document doc, String expression) {
try{
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new MyNamespaceContext(null));
Node result = (Node)xPath.evaluate(expression, doc, XPathConstants.NODE);
if(result != null) {
return result.getNodeValue();
} else {
throw new RuntimeException("can't find expression");
}
} catch (XPathExpressionException err) {
throw new RuntimeException(err);
}
}
}
SAX (alternative to XPath) version:
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
final String[] number = new String[1];
DefaultHandler handler = new DefaultHandler()
{
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException
{
if (qName.equals("documentnbr"))
number[0] = attributes.getValue("number");
}
};
saxParser.parse("input.xml", handler);
System.out.println(number[0]);
I see it's more complicated to use XPath with namespaces as it should be (my opinion). Here is my (simple) code:
XPath xpath = XPathFactory.newInstance().newXPath();
NamespaceContextMap contextMap = new NamespaceContextMap();
contextMap.put("custom", "http://www.PureEdge.com/XFDL/Custom");
contextMap.put("designer", "http://www.PureEdge.com/Designer/6.1");
contextMap.put("pecs", "http://www.PureEdge.com/PECustomerService");
contextMap.put("xfdl", "http://www.PureEdge.com/XFDL/6.5");
contextMap.put("xforms", "http://www.w3.org/2003/xforms");
contextMap.put("", "http://www.PureEdge.com/XFDL/6.5");
xpath.setNamespaceContext(contextMap);
String expression = "//:documentnbr/@number";
InputSource inputSource = new InputSource("input.xml");
String number;
number = (String) xpath.evaluate(expression, inputSource, XPathConstants.STRING);
System.out.println(number);
You can get NamespaceContextMap class (not mine) from here (GPL license). There is also 6376058 bug.
Have a look at the XPathAPI library. It is a simpler way to use XPath without messing with the low-level Java API, especially when dealing with namespaces.
The code to get the number
attribute would be:
String num = XPathAPI.selectSingleNodeAsString(doc, '//documentnbr/@number');
Namespaces are automatically extracted from the root node (doc
in this case). In case you need to explicitly define additional namespaces you can use this:
Map<String, String> nsMap = new HashMap<String, String>();
nsMap.put("xforms", "http://www.w3.org/2003/xforms");
String num =
XPathAPI.selectSingleNodeAsString(doc, '//documentnbr/@number', nsMap);
(Disclaimer: I'm the author of the library.)
精彩评论