HtmlAgilityPack - how to grab <DIV> data in a large web page
I am trying to grab a data from a WEBPAGE , <DIV>
particular class <DIV class="personal_info">
it has 10 similar <DIV>
S and is of same Class "Personal_info" ( as shown in HTML Code and now i want to extract all the DIVs of Class personal_info which are in 10 - 15 in every webpage .
<div class="personal_info"><span class="bold">Rama Anand</span><br><br> Mobile: 9916184586<br>rama_asset@hotmail.com<br> Bangalore</div>
to do the needful i started using HTML AGILE PACK as suggested by some one in Stack overflow and i stuck at the beginn开发者_运维问答ing it self bcoz of lack of knowledge in HtmlAgilePack my C# code goes like this
HtmlAgilityPack.HtmlDocument docHtml = new HtmlAgilityPack.HtmlDocument();
HtmlAgilityPack.HtmlWeb docHFile = new HtmlWeb();
docHtml = docHFile.Load("http://127.0.0.1/2.html");
then how to code further so that data from DIV whose class is "personal_info" can be grabbed ... suggestion with example will be appreciated
I can't check this right now, but isn't it:
var infos = from info in docHtml.DocumentNode.SelectNodes("//div[@class='personal_info']") select info;
To get a url loaded you can do something like:
var document = new HtmlAgilityPack.HtmlDocument();
var url = "http://www.google.com";
var request = (HttpWebRequest)WebRequest.Create(url);
using (var responseStream = request.GetResponse().GetResponseStream())
{
document.Load(responseStream, Encoding.UTF8);
}
Also note there is a fork to let you use jquery selectors in agility pack.
IEnumerable<HtmlNode> myList = document.QuerySelectorAll(".personal_info");
http://yosi-havia.blogspot.com/2010/10/using-jquery-selectors-on-server-sidec.html
What happened to Where?
node.DescendantNodes().Where(node_it => node_it.Name=="div");
if you want top node (root) you use page.DocumentNode as "node".
精彩评论