开发者

Reading HTML table data / html tag

I have some 50 pages of html which have around 100-plus rows of data in each, with all sort of CSS style, I want to read the html file and just get the data, like Name, Age, Class, Teacher. and store it in Database, but I am not able to read the html tags

e.g space i kept to display it here

<table class="table_100">
    <tr>
        <td class="col_1">
            <span class="txt_student">Gauri Singh</span><br>
            <span class="txt_bold">13</span><br>
            <span class="txt_bold">VIII</span><br>
        </td>
     开发者_JAVA百科   <td class="col_2">
            <span class="txt_teacher">Praveen M</span><br>
            <span class="txt_bold">3494</span><br>
            <span class="txt_bold">3Star</span><br>
        </td>
        <td class="col_3">
        </td>
    </tr>
</table>


For .NET you may try Html Agility Pack
You could "convert" HTML pages to XML documents with this:

        HtmlDocument doc = new HtmlDocument();
        doc.Load(@"..\..\your_page.htm");
        doc.OptionOutputAsXml = true;
        doc.Save("your_page.xml");

And then just parse a XML document.


Use Html Agility Pack. It provides an intuitive and robust .net API for parsing and otherwise toying with Html.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜