开发者

parsing data from HTML page using VB/C#/ASP.NET

I have saved some HTML pages from the web...now i want to parse some specific data. I mean I want to retrieve some specific part from the HTMl page using VB/C# code. How do I go about it?

Help me with some code examples VB/C#/ASP.NET.

UPDATE

I am using this code to read the html file

Private Sub cmdSubmit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles cmdSubmit.Click
        Dim oRequest As System.Net.WebRequest
        Dim oResponse As System.Net.WebResponse
        Dim oReader As System.IO.StreamReader
开发者_运维知识库        Dim sResponse As String
        Try
            oRequest = System.Net.WebRequest.Create(txtURI.Text)
            oResponse = oRequest.GetResponse
            oReader = New System.IO.StreamReader(oResponse.GetResponseStream)
            sResponse = oReader.ReadToEnd
        Catch : sResponse = "Could not load page"
        End Try
        txtHTML.Text = sResponse
    End Sub

All i want to do now is to save the specifications to the DATABASE. 1. How do i select the specifications and display them in a ListBox?? 2.How do i save it to the DATABASE??


You may take a look at Html Agility Pack. It's a really nice library for parsing HTML streams and extracting whatever information you might need. Here's an example.


UPDATE:

As requested in the comments section here's an example of how you could fetch the specifications of the laptop from the following address http://www.sony.co.in/product/vpcea4bgn:

Using client = New WebClient()
    client.Headers(HttpRequestHeader.UserAgent) = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2"
    Dim doc = New HtmlDocument()
    doc.LoadHtml(client.DownloadString("http://www.sony.co.in/product/vpcea4bgn"))
    Dim specs = doc.DocumentNode.SelectNodes("//ul[@class='featuresList BodyText']/li/text()")
    For Each spec As HtmlNode In specs
        Dim value = spec.InnerText.Trim()
        If Not String.IsNullOrEmpty(value) Then
            ' TODO: Save the specification to your database or something
            Console.WriteLine(value)
        End If
    Next
End Using

Notice however that screen scraping is fragile and the day Sony changes their HTML structure your application will fail badly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜