How to parse advanced XML files in Java
I've seen numerous examples about how to read XML files in Java.开发者_如何学C But they only show simple XML files. For example they show how to extract first and last names from an XML file. However I need to extract data from a collada XML file. Like this:
<library_visual_scenes>
<visual_scene id="ID1">
<node name="SketchUp">
<instance_geometry url="#ID2">
<bind_material>
<technique_common>
<instance_material symbol="Material2" target="#ID3">
<bind_vertex_input semantic="UVSET0" input_semantic="TEXCOORD" input_set="0" />
</instance_material>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
This is only a small part of a collada file. Here I need to extract the id of visual_scene, and then the url of instance_geometry and last the target of instance_material. Of course I need to extract much more, but I don't understand how to use it really and this is a place to start.
I have this code so far:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
}
catch( ParserConfigurationException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
Document document = null;
try {
document = builder.parse( string );
}
catch( IOException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
catch( SAXException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );
It seems like most examples on the web is similar to this one: http://www.easywayserver.com/blog/java-how-to-read-xml-file/
I need help figuring out what to do when I want to extract deeper tags or find a good tutorial on reading/parsing XML files.
Really, your parsing per se is already done when you call builder.parse(string)
. What you need to know now is how to select/query information from the parsed XML document.
I would agree with @khachik regarding how to do that. Elaborating a little (since no one else has posted an answer):
XPath is the most convenient way to extract information, and if your input document is not huge, XPath is fast enough. Here is a good starting tutorial on XPath in Java. XPath is also recommended if you need random access to the XML data (i.e. if you have to go back and forth extracting data from the tree in a different order than it appears in the source document), since SAX is designed for linear access.
Some sample XPath expressions:
- extract the id of visual_scene:
/*/visual_scene/@id
- the url of instance_geometry:
/*/visual_scene/node/instance_geometry/@url
- the url of instance_geometry for node whose name is Sketchup:
/*/visual_scene/node[@name = 'Sketchup']/instance_geometry/@url
- the target of instance_material:
/*/visual_scene/node/instance_geometry/bind_material/technique_common/instance_material/@target
Since COLLADA models can be really large, you might need to do a SAX-based filter, which will allow you to process the document in stream mode without having to keep it all in memory at once. But if your existing code to parse the XML is already performing well enough, you may not need SAX. SAX is more complicated to use for extracting specific data than XPath.
You are using DOM in your code.
DOM creates a tree structure of the xml file it parsed, and you have to traverse the tree to get the information in various nodes.
In your code all you did is create the tree representation. I.e.
document = builder.parse( string );//document is loaded in memory as tree
Now you should reference the DOM apis to see how to get the information you need.
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );
For instance this method returns a NodeList of all elements with the specified name.
Now you should loop over the NodeList
for (int i = 0; i < library_visual_scenes.getLength(); i++) {
Element element = (Element) nodes.item(i);
Node visual_scene = element.getFirstChild();
if(visual_scene.getNodeType() == Node.ELEMENT_NODE)
{
String id = ((Element)visual_scene).getAttribute(id);
System.out.println("id="+id);
}
}
DISCLAIMER: This is a sample code. Have not compiled it. It shows you the concept. You should look into DOM apis.
EclipseLink JAXB (MOXy) has a useful @XmlPath extension for leveraging XPath to populate an object. It may be what you are looking for. Note: I am the MOXy tech lead.
The following example maps a simple address object to Google's representation of geocode information:
package blog.geocode;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;
import org.eclipse.persistence.oxm.annotations.XmlPath;
@XmlRootElement(name="kml")
@XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {
@XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
private String street;
@XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
private String city;
@XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
private String state;
@XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
private String country;
@XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
private String postalCode;
}
For the rest of the example see:
- http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html
Nowadays, several java RAD tools have java code generators from given DTDs, so you can use them.
精彩评论