How to parse advanced XML files in Java

2023-01-26 08:54 问答作者：

I've seen numerous examples about how to read XML files in Java.开发者_如何学C But they only show simple XML files. For example they show how to extract first and last names from an XML file. However I need to extract data from a collada XML file. Like this:

<library_visual_scenes>
    <visual_scene id="ID1">
        <node name="SketchUp">
            <instance_geometry url="#ID2">
                <bind_material>
                    <technique_common>
                        <instance_material symbol="Material2" target="#ID3">
                            <bind_vertex_input semantic="UVSET0" input_semantic="TEXCOORD" input_set="0" />
                        </instance_material>
                    </technique_common>
                </bind_material>
            </instance_geometry>
        </node>
    </visual_scene>
</library_visual_scenes>

This is only a small part of a collada file. Here I need to extract the id of visual_scene, and then the url of instance_geometry and last the target of instance_material. Of course I need to extract much more, but I don't understand how to use it really and this is a place to start.

I have this code so far:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
    builder = factory.newDocumentBuilder();
}
catch( ParserConfigurationException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
Document document = null;
try {
    document = builder.parse( string );
}
catch( IOException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
catch( SAXException error ) {
    Log.e( "Collada", error.getMessage() ); return;
}
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

It seems like most examples on the web is similar to this one: http://www.easywayserver.com/blog/java-how-to-read-xml-file/

I need help figuring out what to do when I want to extract deeper tags or find a good tutorial on reading/parsing XML files.

Really, your parsing per se is already done when you call builder.parse(string). What you need to know now is how to select/query information from the parsed XML document.

I would agree with @khachik regarding how to do that. Elaborating a little (since no one else has posted an answer):

XPath is the most convenient way to extract information, and if your input document is not huge, XPath is fast enough. Here is a good starting tutorial on XPath in Java. XPath is also recommended if you need random access to the XML data (i.e. if you have to go back and forth extracting data from the tree in a different order than it appears in the source document), since SAX is designed for linear access.

Some sample XPath expressions:

extract the id of visual_scene: /*/visual_scene/@id
the url of instance_geometry: /*/visual_scene/node/instance_geometry/@url
the url of instance_geometry for node whose name is Sketchup: /*/visual_scene/node[@name = 'Sketchup']/instance_geometry/@url
the target of instance_material: /*/visual_scene/node/instance_geometry/bind_material/technique_common/instance_material/@target

Since COLLADA models can be really large, you might need to do a SAX-based filter, which will allow you to process the document in stream mode without having to keep it all in memory at once. But if your existing code to parse the XML is already performing well enough, you may not need SAX. SAX is more complicated to use for extracting specific data than XPath.

You are using DOM in your code.
DOM creates a tree structure of the xml file it parsed, and you have to traverse the tree to get the information in various nodes.
In your code all you did is create the tree representation. I.e.

document = builder.parse( string );//document is loaded in memory as tree

Now you should reference the DOM apis to see how to get the information you need.

NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );

For instance this method returns a NodeList of all elements with the specified name.
Now you should loop over the NodeList

 for (int i = 0; i < library_visual_scenes.getLength(); i++) {
   Element element = (Element) nodes.item(i);
   Node visual_scene = element.getFirstChild();
   if(visual_scene.getNodeType() == Node.ELEMENT_NODE)
   {
      String id = ((Element)visual_scene).getAttribute(id);
      System.out.println("id="+id);
    }
 }

DISCLAIMER: This is a sample code. Have not compiled it. It shows you the concept. You should look into DOM apis.

EclipseLink JAXB (MOXy) has a useful @XmlPath extension for leveraging XPath to populate an object. It may be what you are looking for. Note: I am the MOXy tech lead.

The following example maps a simple address object to Google's representation of geocode information:

package blog.geocode;

import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

import org.eclipse.persistence.oxm.annotations.XmlPath;

@XmlRootElement(name="kml")
@XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
    private String street;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
    private String city;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
    private String state;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
    private String country;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
    private String postalCode;

}

For the rest of the example see:

http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html

Nowadays, several java RAD tools have java code generators from given DTDs, so you can use them.

继续阅读：collada xml

How to parse advanced XML files in Java

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？