Reading HTML table data / html tag
I have some 50 pages of html which have around 100-plus rows of data in each, with all sort of CSS style, I want to read the html file and just get the data, like Name, Age, Class, Teacher. and store it in Database, but I am not able to read the html tags
e.g space i kept to display it here
<table class="table_100">
<tr>
<td class="col_1">
<span class="txt_student">Gauri Singh</span><br>
<span class="txt_bold">13</span><br>
<span class="txt_bold">VIII</span><br>
</td>
开发者_JAVA百科 <td class="col_2">
<span class="txt_teacher">Praveen M</span><br>
<span class="txt_bold">3494</span><br>
<span class="txt_bold">3Star</span><br>
</td>
<td class="col_3">
</td>
</tr>
</table>
For .NET you may try Html Agility Pack
You could "convert" HTML pages to XML documents with this:
HtmlDocument doc = new HtmlDocument();
doc.Load(@"..\..\your_page.htm");
doc.OptionOutputAsXml = true;
doc.Save("your_page.xml");
And then just parse a XML document.
Use Html Agility Pack. It provides an intuitive and robust .net API for parsing and otherwise toying with Html.
精彩评论