How to parse the HTMLDocument in c#?
I want to get the text of an html page using a simple application in c#? If there are nested elements ie.,
<Table>
<TR>
<TD>**ABC**
</TD>
<TD>**1**
</TD>
</TR>
<TR>
<TD>**XYZ**
开发者_如何转开发 </TD>
<TD>**2**
</TD>
</TR>
</Table>
How can I get the text(bold) directly values.I want to save them in my database and also want to show in gridview?
HtmlDocument htmlSnippet = new HtmlDocument();
htmlSnippet = LoadHtmlSnippetFromFile();
private HtmlDocument LoadHtmlSnippetFromFile()
{
//TextReader reader = File.OpenText(Server.MapPath("~/App_Data/HtmlSnippet.txt"));
WebClient webClient = new WebClient();
const string strUrl = "http://www.dsebd.org/latest_PE_all2_08.php";
Stream reader = webClient.OpenRead(strUrl);
HtmlDocument doc = new HtmlDocument();
doc.Load(reader);
reader.Close();
return doc;
}
From this htmlSnippet
how could i get the value?
I'm not sure, what you need ... given your example, do you want a string "**ABC****1****XYZ****2**"
?
Then this should work: htmlSnippet.Body.OuterText
EDIT: Ok, trying for a example for separate values ...
HtmlElement tableElement = FindElement(HtmlDocument.Body, "table");
foreach(HtmlElement row in tableElement.Children)
{
if (row.Name.ToLower() == "tr")
{
// create whatever class you use for a row
foreach(HtmlElement cell in row.Children)
{
if (cell.Name.ToLower() == "td")
{
// add a new cell to your row using cell.InnerText
}
}
}
}
// *** snip ***
private HtmlElement FindElement(HtmlElement element, string name)
{
if (element.Name.ToLower() == name)
{
return element;
}
foreach(HtmlElement child in element.Children)
{
HtmlElement test = FindElement(test, name);
if (test != null)
{
return test;
}
}
return null;
}
Sorry, I have no Visual Studio here right now to test the code ... good luck ;-)
精彩评论