Extract specific HTML text using HTMLAgilityPack
<table class="result" summary="Summary Description.">
<tbody>
<tr>
<th scope="col" class="firstcol">Column 1</th>
<th scope="col">Column 2</th>
<th scope="col">Column 3</th>
<th scope="col" class="lastcol">Column 4</th>
</tr>
<tr class="even">
<td class="firstcol">Text 1</td>
<td>Text 2</td>
<td>4Text 3</td>
<td class="lastcol">Text 4</td>
</tr>
</tb开发者_开发技巧ody></table>
The part of the HTML Im interested in looks like this. I want Text 1, Text 2, Text 3 and Text 4. Using HTMLAgilityPack, how can I extract that data? I google and checked this site but didnt find something that matched my scenario exactly.
if (htmlDoc.DocumentNode != null)
{
foreach (HtmlNode text in htmlDoc.DocumentNode.SelectNodes(???)
{
???
}
}
Try this:
var html = @"<table class=""result"" summary=""Summary Description.""> <tbody> <tr> <th scope=""col"" class=""firstcol"">Column 1</th> <th scope=""col"">Column 2</th> <th scope=""col"">Column 3</th> <th scope=""col"" class=""lastcol"">Column 4</th> </tr> <tr class=""even""> <td class=""firstcol"">Text 1</td> <td>Text 2</td> <td>4Text 3</td> <td class=""lastcol"">Text 4</td> </tr> </tbody></table>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.SelectNodes(@"//tr[@class='even']/td/text()").ToList();
foreach(var textNode in textNodes)
{
Console.WriteLine(textNode.InnerText);
}
精彩评论