Getting all the anchor tags of a web page
Given a web URL, I want to detect all the links in a WEBSITE, identify the internal links and list them.
What I have is this:
WebClient webClient = null;
webClient = new WebClient();
string strUrl = "http://www.anysite.com";
string completeHTMLCode = "";
try
{
completeHTMLCode = webClient.DownloadString(st开发者_如何学PythonrUrl);
}
catch (Exception)
{
}
Using this I can read the contents of the page....but the only idea I have in my mind is parsing this string....searching for <a
then href
then the value between the double quotes.
Is this the only way out? Or there lies some other better solution(s)?
Use the HTML Agility Pack. Here's a link to a blog post to get you started. Do not use Regex.
using HtmlAgilityPack
completeHTMLCode =
webClient.DownloadString(strUrl);
doc.Load(completeHTMLCode);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@a"])
{
//
}
精彩评论