开发者

How do I download the source code of a web page and then stick it in SAX parser as a whole?

I just want to download the source as a string. Then stick that XML (which is currently a string)开发者_开发百科 into a parser.


Using a SAX parser implies you have an org.xml.sax.ContentHandler that can accept callbacks from your parser. I wonder what that ContentHandler is and what use you intend to make of the callbacks.

You can wrap a StringReader around your string and pass it to a null transform that translates between a StreamSource and a SAXResult like so:

void parseStringWithSAX (String xmlString, ContentHandler handler)
    throws TransformerConfigurationException, TransformerException  {
    Source source = new StreamSource (new StringReader (xmlString));
    Result result = new SAXResult (handler);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer t = tf.newTransformer();
    t.transform(source, result);
}

Ari.


You might want to use Apache Jakarta Commons HttpClient to make the connection, then use a parser like Xerces, JAXP, or what have you to read the input stream and parse it.

On the HttpClient side, you will generally create a method (e.g., a GetMethod), service it with an instance of HttpClient, then ask for the method's response body. This tutorial should get you rolling.

On the parser side, you may want to read up on the differences in SAX and DOM parsers, as they dictate a bit about the way they are used and have slightly different strengths depending on the size of the data, your memory constraints, and the types of queries you'll want to make against the document.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜