开发者

How does this Java Program Run?

I read about DOMParser and SAXParser in Java. I have no doubts in DOMParser and people prefer SAXParser than DOMParser, because of the memory it takes. However I understand the concept of SAXParser, i could not able to under this code:

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class ReadXMLFileSAX {

 public static void main(String args[]) {

  try {

     SAXParserFactory factory = SAXParserFactory.newInstance();
     SAXParser saxParser = factory.newSAXParser();

     DefaultHandler handler = new DefaultHandler() {

     boolean bfname = false;
     boolean blname = false;
     boolean bnname = false;
     boolean bsalary = false;

     public void startElement(String uri, String localName,
        String qName, Attributes attributes)
        throws SAXException {

        System.out.println("Start Element :" + qName);

        if (qName.equalsIgnoreCase("FIRSTNAME")) {
           bfname = true;
        }

        if (qName.equalsIgnoreCase("LASTNAME")) {
           blname = true;
        }

        if (qName.equalsIgnoreCase("NICKNAME")) {
           bnname = true;
        }

        if (qName.equalsIgnoreCase("SALARY")) {
           bsalary = true;
        }

     }

     public void endElement(String uri, String localName,
          String qName)
          throws SAXException {

          System.out.println("End Element :" + qName);

     }

     public void characters(char ch[], int start, int length)
         throws SAXException {

         if (bfname) {
            System.out.println("First Name : "
                + new String(ch, start, length));
            bfname = false;
          }

          if (blname) {
              System.out.println("Last Name : "
                  + new String(ch, start, length));
              blname = false;
           }

          if (bnname) {
              System.out.println("Nick Name : "
                  + new String(ch, start, length));
              bnname = false;
           }

          if (bsalary) {
              System.out.println("Salary : "
                  + new String(ch, start, length));
              bsalary = false;
           }

        }

      };

      saxParser.parse("/home/anto/Groovy/Java/file.xml", handler);

    } catch (Exception e) {
      e.printStackTrace();
    }
  }

}

And the .xml file is :

<?xml version="1.0"?>
<company>
    <staff>
        <firstname>yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
    <staff>
        <firstname>low</firstname>
        <lastname>yin fong</lastname>
        <nickname>fong fong</nickname>
        <salary>200000</salary>
    </staff>
</company>

And when i run the program i get the output like this:

Start Element :company
Start Element :staff
Start Element :firstname
First Name : yong
End Element :firstname
Start Element :lastname
Last Name : mook kim
End Element :lastname
Start Element :nickname
Nick Name : mkyong
End Element :nickname
Start Element :salary
Salary : 100000
End Element :salary
End Element :staff
Start Element :staff
Start Element :firstname
First Name : low
End Element :firstname
Start Element :lastname
Last Name : yin fong
End Element :lastname
Start Element :nickname
Nick Name : fong fong
End Element :nickname
Start Element :salary
Salary : 200000
End Element :salary
End Element :staff
End Element :company

The output looks very开发者_如何学Python fine, but i'm confused with the output! How the order of the output is been printed? Which handles this? Since this is the first time I have read SAX And DOM, i could not able to figure it, kindly help me.


SAX is event-based. So, each time it sees a start tag, attribute, characters within a tag, end tag, ... it calls the appropriate function of the handler.

So the flow here is:

  1. See the company tag, call startElement for it
  2. See the staff tag, call startElement for it
  3. See the firstname tag, call startElement for it (which sets a boolean)
  4. See characters ("yong"), call the characters function for them (which sees which boolean is set and prints the appropriate message and clears the flag)
  5. See the closing firstname tag, call the endElement function

...


By calling saxParser.parse("/home/anto/Groovy/Java/file.xml", handler);, The SAX Parser uses your DefaultHandler(which is your handler that you passed as parameter) that you implemented to do XML parsing.

SAX is event-based, these event is encountered when the parser traverses in your XML document. When SAX parser encounters a start of an element, example <firstname>, it calls the startElement method. It then, traverse to the body of the start element, and sees yong. Since it's not enclosed in a <> tag, it's considered a text node, therefore it calls the characters method. If there was another XML element, it would call the startElement again for the new XML element.

Finally, the SAX Parser traverses till it sees the end element </firstname> and calls the endElement method.

All these 3 methods startElement, characters and endElement are implemented by the developer (in your case, YOU).

Don't forget, SAX traverses through your XML document only. It doesn't keep record of which node is a parent or child of which node.

Hope this helps!


The power of SAX parser is its events. All you need to do is to override/implement the proper methods and the onus is on the parsing library to call the events in the order.


The order looks fine to me. What's the issue?

If you're talking about the start and end elements, that just shows the XML tag nesting. You see that "company" comes before "staff", and "staff" before "firstname".

Finally that you have the data itself, inside the individual tags. That's why the last three lines are:

End Element :salary
End Element :staff
End Element :company

Because it's leaving the salary, salary is the last element of staff, and that's the final staff of the company.


As parser reads input XML it calls startElement on every opening tag, and it calls endElement on every closing tag. If parser meets contents of tag, like yong, it calls characters.

Code you posted tracks which tag is currently parsed by using state variables bfname, bsalary, etc. Once characters is called, your code knows which entity it's called for -- first name, last name or salary, so it can decipher raw characters string properly.

So, while writing your SAX parser, in fact you writing callbacks for tracking state of your parser inside XML -- which part of XML it's currently reads.

On the contrary, while using DOM parser, you get whole XML document converted to tree, so you can navigate from it's root to nodes, or backwards -- from nodes to root, in any manner you like.


Near the end, you'll notice that the saxParser.parse() method is given handler as a parameter. The handler is an instance of DefaultHandler that was defined earlier in the code. The SAXParser calls the appropriate method on the handler as it parses the XML document. Here is some Javadoc on DefaultHandler and SAXParser (see the documentation on the parse methods). As the XML document is parsed and each method in the handler is called in turn, the handler method prints out the values that were processed.


A SAX parser just iterates through a document, one character at a time. The parse() method of the Parser takes a Handler object. Various methods of this object get called by the parser when the parser encounters certain characters in the document (an "event"). So every time the parser encounters a start tag, it calls the startElement method of the Handler, when it encounters an end tag it calls the endElement method and so on. These methods in the DefaultHandler are empty. It is up to you to sub-class this class and provide your own implementation of these methods (in your code example above the Defaulthandler has been anonymously subclassed).

Unlike a DOM Parser a SAX Parser does not construct elements - it just fires the various handler methods as it encounters start and end tags and content characters. It is up to you to, within these methods, provide the logic the maps an end tag to a start tag and so on, which is what the condition statements are doing in the startElement and endElement methods. And the class variables blname etc are just keeping track of what element the parser is currently in - so that you know what the characters relate to that are passed into the characters() method.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜