开发者

Parsing using HTMLParser

Parser parser = new Parser();
    parser.setInputHTML("d:/index.html");
    parser.setEncoding("UTF-8");
    NodeList nl = parser.parse(null); 
    /*
    SimpleNodeIterator sNI=list.elements();
    while(sNI.hasMoreNodes()){
    System.out.println(sNI.nextNode().getText());}
    */
    NodeList trs = nl.extractAllNodesThatMatch(new TagNameFilter("tr"),true);
    for(int i=0;i<trs.size();i++) {
        NodeList nodes = trs.elementAt(i).getChildren();
        NodeList tds  = nodes.extractAllNodesThatMatch(new TagNameFilter("td"),true);
    System.out.println(tds.toString());

I am not getting any output, e开发者_Python百科clipse shows javaw.exe terminated.


Pass the path to the resource into the constructor.

Parser parser = new Parser("index.html");

Parse and print all the divs on this page:

Parser parser = new Parser("http://stackoverflow.com/questions/7293729/parsing-using-htmlparser/");
parser.setEncoding("UTF-8");
NodeList nl = parser.parse(null);
NodeList div = nl.extractAllNodesThatMatch(new TagNameFilter("div"),true);
System.out.println(div.toString());

parser.setInputHtml(String inputHtml) doesn't do what you think it does. It treats inputHtml as the html input to the parser. You use the constructor to point the parser at an html resource (file or URL).

Example:

Parser parser = new Parser();
parser.setInputHTML("<div>Foo</div><div>Bar</div>");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜