parsing data from HTML page using VB/C#/ASP.NET
I have saved some HTML pages from the web...now i want to parse some specific data. I mean I want to retrieve some specific part from the HTMl page using VB/C# code. How do I go about it?
Help me with some code examples VB/C#/ASP.NET.
UPDATE
I am using this code to read the html file
Private Sub cmdSubmit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSubmit.Click
Dim oRequest As System.Net.WebRequest
Dim oResponse As System.Net.WebResponse
Dim oReader As System.IO.StreamReader
开发者_运维知识库 Dim sResponse As String
Try
oRequest = System.Net.WebRequest.Create(txtURI.Text)
oResponse = oRequest.GetResponse
oReader = New System.IO.StreamReader(oResponse.GetResponseStream)
sResponse = oReader.ReadToEnd
Catch : sResponse = "Could not load page"
End Try
txtHTML.Text = sResponse
End Sub
All i want to do now is to save the specifications to the DATABASE. 1. How do i select the specifications and display them in a ListBox?? 2.How do i save it to the DATABASE??
You may take a look at Html Agility Pack. It's a really nice library for parsing HTML streams and extracting whatever information you might need. Here's an example.
UPDATE:
As requested in the comments section here's an example of how you could fetch the specifications of the laptop from the following address http://www.sony.co.in/product/vpcea4bgn:
Using client = New WebClient()
client.Headers(HttpRequestHeader.UserAgent) = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
Dim doc = New HtmlDocument()
doc.LoadHtml(client.DownloadString("http://www.sony.co.in/product/vpcea4bgn"))
Dim specs = doc.DocumentNode.SelectNodes("//ul[@class='featuresList BodyText']/li/text()")
For Each spec As HtmlNode In specs
Dim value = spec.InnerText.Trim()
If Not String.IsNullOrEmpty(value) Then
' TODO: Save the specification to your database or something
Console.WriteLine(value)
End If
Next
End Using
Notice however that screen scraping is fragile and the day Sony changes their HTML structure your application will fail badly.
精彩评论