Html Agility Pack problem retrieving data
I am trying to parse data from web page http://www.bbb.org/kitchener/accredited-business-directory?letter=a
i want to get all the categories like
Accountants - Certified Public (2)
Accounting Services (1) etc but problem is when i goto node then tag a is null i donot know why but HTMLagility pack does not get these tags. Checking in watch it says that div only encloses thest commented breakline tags not the tag where as when we see in page source it is there
doc.DocumentNode.SelectNodes("//tr/td/table/tr/td/div/div")[0].OuterHtml "<div style=\"font-size: 12px;line-height: 16px;\"><!--<开发者_C百科br />-->\r\n<!--<br />-->\r\n</div>"
here is start of that div Note i have included only 2 tags from the HTML
<div style="float: left; width: 305px;">
<h5 style="margin: 0px; margin-bottom: 5px; border-bottom: 1px solid #cccccc; padding-bottom: 5px; font-size: 12px;">Categories Starting with letter 'a'</h5>
<div style="font-size: 12px;line-height: 16px;">
<!--<br />-->
<!--<br />-->
<a class="listingName" href="/kitchener/accredited-business-directory/accountants">Accountants (11)</a><br />
<a class="listingName" href="/kitchener/accredited-business-directory/accountants-certified-public">Accountants - Certified Public (2)</a><br />
</div>
</div>
how can i get data
Even putting does not reveal the links
foreach (var test in doc.DocumentNode.SelectNodes("//a[@href]"))
{ MessageBox.Show(test.InnerText+"\n"+test.InnerHtml); }
This worked fine for me using the following sample:
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.bbb.org/kitchener/accredited-business-directory?letter=a");
foreach (var link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
Console.WriteLine(link.InnerText);
}
Output (shortened):
BBB
Home
Accredited Business Directory
Accountants (11)
Accountants - Certified Public (2)
Accounting Services (1)
Advertising - Direct Mail (3)
Advertising Agencies & Counselors (3)
Advertising Specialties (3)
...
精彩评论