开发者

Remove invalid characters from String when parsing XML in Java

I've been googling around and reading on SO, but nothing worked. I have a problem with 开发者_运维百科characters in an XML feed. I save the value of each tag in a String, but when 
 occurs, it just stops. I only get the 4-5 first words in the tag or so.

So can anyone please help me with a method that can remove it? Or can it be that the text in the tags in the XML feed are too long for a String?

Thanks!

Sample code:

    public void characters(char[] ch, int start, int length)
        throws SAXException {

    if (currentElement) {
        currentValue = new String(ch, start, length);
        currentElement = false;
    }

}

public void endElement(String uri, String localName, String qName)
        throws SAXException {

    currentElement = false;

    /** set value */ 
    if (localName.equalsIgnoreCase("title"))
        sitesList.setTitle(currentValue);
    else if (localName.equalsIgnoreCase("id"))
        sitesList.setId(currentValue);
    else if(localName.equalsIgnoreCase("description"))
        sitesList.setDescription(currentValue);
}

The text in the description tag is quite long, but I only get the first five words before the 
 characters starts coming.


You're using a SAXparser to parse the XML-String.

The characters()-method can be called multiple times when only reading one XML-element. This happens when it finds something like <desc>blabla bla & # 39; bla bla la.</desc>.

The solution is to use a StringBuilder and append the readed characters in the characters()-method and then reset the StringBuilder in the endElement()-method:

private class Handler extends DefaultHandler{

    private StringBuilder temp_val;

    public Handler(){
        this.temp_val = new StringBuilder();
    }

    public void characters(char[] ch, int start, int length){
        temp_val.append(ch, start, length);
    }

    public void endElement(String uri, String localName, String qName){
        System.out.println("Output: "+temp_val.toString());
        // ... Do your stuff
        temp_val.setLength(0); // Reset the StringBuilder
    }

}

The above code works for me, given this XML-File:

<?xml version="1.0" encoding="iso-8859-1" ?>
<test>This is some &#13; example-text.</test>

The output is:

Output: This is some
example-text.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜