开发者

Extracting Text Nodes From XML File Using SAX Parser in JAVA

So I am currently using SAX to try and extract some information from a a number of xml documents I am working from. Thus far, it is really easy to extract the attribute values. However, I have no clue how to go about extracting actual values from a text node.

For example, in the given XML document:

<w:rStyle w:val="Highlight" /> 
  </w:rPr>
  </w:pPr>
- <w:r>
  <w:t>Text to Extract</w:t> 
  </w:r>
  </w:p>
- <w:p w:rsidR="00B41602" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
  <w:pStyle w:val="Copy" /> 

I can extract "Highlight" no problem by getting the value from val. But I have no idea how to get into that text node and get out "Text to Extract".

Here is my Java code thus far to pull out the attribute values...

private static final class SaxHandler extends DefaultHandler 
    {
        // invoked when document-parsing is started:
        public void startDocument() throws SAXException 
        {
            System.out.println("Document processing starting:");
        }

        // notifies about finish of parsing:
        public void endDocument() throws SAXException 
        {
            System.out.println("Document processing finished. \n");
        }

        // we enter to element 'qName':
        public void startElement(String uri, String localName, 
                String qName, Attributes attrs) throws SAXException 
        {
            if(qName.equalsIgnoreCase("Relationships"))
            {
                // do nothing
            }
            else if(qName.equalsIgnoreCase("Relationship"))
            {
                // goes into the element and if the attribute is equal to "Target"...
                String val = attrs.getValue("Target");
                // ...and the value is not null
                if(val != null)
                {
                    // ...and if the value contains "image" in it...
                    if (val.contains("image"))
                    {
                        // ...then get the id value
开发者_如何学编程                        String id = attrs.getValue("Id");
                        // ...and use the substring method to isolate and print out only the image & number
                        int begIndex = val.lastIndexOf("/");
                        int endIndex = val.lastIndexOf(".");
                        System.out.println("Id: " + id + " & Target: " + val.substring(begIndex+1, endIndex));
                    }
                }
            }
            else 
            {
                throw new IllegalArgumentException("Element '" + 
                        qName + "' is not allowed here");
            }
        }

        // we leave element 'qName' without any actions:
        public void endElement(String uri, String localName, String qName) throws SAXException 
        {
            // do nothing;
        }
     }

But I have no clue where to start to get into that text node and pull out the values inside. Anyone have some ideas?


Here's some pseudo-code:

private boolean insideElementContainingTextNode;
private StringBuilder textBuilder;

public void startElement(String uri, String localName, String qName, Attributes attrs) {
    if ("w:t".equals(qName)) { // or is it localName?
        insideElementContainingTextNode = true;
        textBuilder = new StringBuilder();
    }
}

public void characters(char[] ch, int start, int length) {
    if (insideElementContainingTextNode) {
        textBuilder.append(ch, start, length);
    }
}

public void endElement(String uri, String localName, String qName) {
    if ("w:t".equals(qName)) { // or is it localName?
        insideElementContainingTextNode = false;
        String theCompleteText = this.textBuilder.toString();
        this.textBuilder = null;
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜