Regarding parser DOM and REGEX
I am writing an application in java I need to fetch specific data from website.I do not know which one to use whether REGEX or Parser.Can anybody please advise me how to get this done? and开发者_JS百科 which one is prefered.
Thanks
I believe the choice quite is "Even Jon Skeet cannot parse HTML using regular expressions.". Depending on how complex the information you're trying to pull out of html is, you may be better off with some sort of a parser. What are you looking to pull and from where?
Definitely, Get a HTML Parser
Here is some comparison about few Java HTML Parsers.
Some of them here
NekoHTML:
final DOMParser parser = new DOMParser();
try {
parser.parse(new InputSource(urlIS));
document = parser.getDocument();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
TagSoup:
final Parser parser = new Parser();
SAX2DOM sax2dom = null;
try {
sax2dom = new SAX2DOM();
parser.setContentHandler(sax2dom);
parser.setFeature(Parser.namespacesFeature, false);
parser.parse(new InputSource(urlIS));
} catch (Exception e) {
e.printStackTrace();
}
document = sax2dom.getDOM();
精彩评论