Parsing HTML and counting tags with C#
Suppose I have a block of HTML in a string:
<div class="nav mainnavs">
<ul>
<li><a id="nav-questions" href="/questions">Questions</a></li>
开发者_运维技巧 <li><a id="nav-tags" href="/tags">Tags</a></li>
<li><a id="nav-users" href="/users">Users</a></li>
<li><a id="nav-badges" href="/badges">Badges</a></li>
<li><a id="nav-unanswered" href="/unanswered">Unanswered</a></li>
</ul>
</div>
How can I parse the HTML and count the number of instances of a specific type of tag, such as <div>
or <li>
?
You can use HtmlAgilityPack for this - the latest version supports Linq so this is straight-forward:
For a local html file:
HtmlDocument doc = new HtmlDocument();
doc.Load(@"test.html");
int liCount = doc.DocumentNode.Descendants("li").Count(); //returns 5
From the web:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://stackoverflow.com");
int liCount = doc.DocumentNode.Descendants("li").Count();
精彩评论