开发者

Regarding parser DOM and REGEX

I am writing an application in java I need to fetch specific data from website.I do not know which one to use whether REGEX or Parser.Can anybody please advise me how to get this done? and开发者_JS百科 which one is prefered.

Thanks


I believe the choice quite is "Even Jon Skeet cannot parse HTML using regular expressions.". Depending on how complex the information you're trying to pull out of html is, you may be better off with some sort of a parser. What are you looking to pull and from where?


Definitely, Get a HTML Parser

Here is some comparison about few Java HTML Parsers.

Some of them here

NekoHTML:

final DOMParser parser = new DOMParser();
try {
    parser.parse(new InputSource(urlIS));
    document = parser.getDocument();
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

TagSoup:

final Parser parser = new Parser();
SAX2DOM sax2dom = null;
try {
    sax2dom = new SAX2DOM();
    parser.setContentHandler(sax2dom);
    parser.setFeature(Parser.namespacesFeature, false);
    parser.parse(new InputSource(urlIS));
} catch (Exception e) {
    e.printStackTrace();
}
document = sax2dom.getDOM();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜