开发者

Why does SaxParser fail at random?

I'm using SAX parser in my Android application to read a few feeds a time. The script is executed as follows.

                     // Begin FeedLezer
                    try {

                        /** Handling XML **/
                        SAXParserFactory spf = SAXParserFactory.newInstance();
                        SAXParser sp = spf.newSAXParser();
                        XMLReader xr = sp.getXMLReader();

                        /** Send URL to parse XML Tags **/
                        URL sourceUrl = new URL(
                            BronFeeds[i]);

                        /** Create handler to handle XML Tags ( extends DefaultHandler ) **/
                        Feed_XMLHandler myXMLHandler = new Feed_XMLHandler();
   开发者_开发知识库                     xr.setContentHandler(myXMLHandler);
                        xr.parse(new InputSource(sourceUrl.openStream()));

                    } catch (Exception e) {
                        System.out.println("XML Pasing Excpetion = " + e);
                    }
                     sitesList = Feed_XMLHandler.sitesList;

                    String titels = sitesList.getMergedTitles();

And here are Feed_XMLHandler.java and Feed_XMLList.java, which I basically both just took from the web.

However, this code fails at times. I'll show some examples.

http://imm.io/media/2I/2IAs.jpg It goes very well here. It even recognizes and displays apostrophes. Even when clicking the articles open, almost all of the text shows, so that's all good. The source feed is here. I can't control the feed.

http://imm.io/media/2I/2IB1.jpg Here, it doesn't go so well. It does display the ï, but it chokes on the apostrophe (there's supposed to be 'NORAD' after the Waarom). Here

http://imm.io/media/2I/2IBQ.jpg This is the worst one. As you can see, the title only displays an apostrophe, whilst it is supposed to be a 'blablabla'. Also, the text ends in the middle of the line, without any special characters in the quote. The feed is here

In all cases, I have no control over the feed. I think the script does choke on special characters. How can I make sure SAX fetches all the strings correctly?

If anyone knows an answer to this, you really help me out a LOT :D

Thanks in advance.


This is from the FAQ of Xerces.

Why does the SAX parser lose some character data or why is the data split into several chunks? If you read the SAX documentation, you will find that SAX may deliver contiguous text as multiple calls to characters, for reasons having to do with parser efficiency and input buffering. It is the programmer's responsibility to deal with that appropriately, e.g. by accumulating text until the next non-characters event.

You're code is very well adapted from one of many XML Parsing tutorials (like this one here) Now, the tutorial is good and all, but they fail to mention something very important...

Notice this part here...

    public void characters(char[] ch, int start, int length)
            throws SAXException
    {
              if(in_ThisTag){
                     myobj.setName(new String(ch,start,length))
              }
    }

I bet at this point you're checking up booleans to mark which tag you're under and then setting a value in some kind of class you made? or something like that....

But the problem is, the SAX parser (which is buffered) will not necesarily get you all the characters between a tag at one go....say if <tag> Lorem Ipsum...really long sentence...</tag> so your SAX parser calls characters function is chunks....

So the trick here, is to keep appending the values to a string variable and the actually set (or commit) it to your structure when the tag ends...(ie in endElement)

Example

@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {

    currentElement = false;

    /** set value */
    if (localName.equalsIgnoreCase("tag"))
            {
        sitesList.setName(currentValue);
                    currentValue = ""; //reset the currentValue
            }

}

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {

    if (in_Tag) {
        currentValue += new String(ch, start, length); //keep appending string, don't set it right here....maybe there's more to come.
    }

}

Also, it would be better if you use StringBuilder for the appending, since that'll be more efficient....

Hope it makes sense! If it didn't check this and here

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜