开发者

Copy html table in Excel

I must write a program that periodically reads a web page and copies certain data from a table on that page to an Excel spreadsheet. I don't know where to start or what programming language is suitable for this project. I know a little C++ and Matlab programming. Can anyone offer advice to point me in the right direction or suggest open source projects which do something similar?

I can use wget(linux) or fget1(matlab) to download the webpages, but I don't know how can I save certain data from sou开发者_如何学JAVArce of this webpages into Excel.


I will assume you have room for learning C#. Since you have to extract the table from a web page, you need a special library/framework to deal with web browsing such as Watin. After getting the table, it's matter of saving into Excel spreadsheet. For convenience, you can write a CSV format (comma separated text) and excel can open the file. Hope it helps


I used the following code vb.net to parse multiple html table from a saved web page to a datatable (the table must have the same structure) (using Html-Agility-Pack) and save it to Xml file:

    Imports System.Net

    Public Sub ParseHtmlTable(byval HtmlFilePath as String)

    Dim webStream As Stream
    Dim webResponse = ""
    Dim req As FileWebRequest
    Dim res As FileWebResponse

    req = WebRequest.Create("file:///" & HtmlFilePath)

    req.Method = "GET" ' Method of sending HTTP Request(GET/POST)

    res = req.GetResponse ' Send Request

    webStream = res.GetResponseStream() ' Get Response

    Dim webStreamReader As New StreamReader(webStream)

    Dim htmldoc As New HtmlAgilityPack.HtmlDocument
    htmldoc.LoadHtml(webStreamReader.ReadToEnd())

    Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tr")

    Dim dtTable As New DataTable("Table1")

    Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList

    For Each Hr In Headers

        dtTable.Columns.Add(Hr)

    Next

    For Each node As HtmlAgilityPack.HtmlNode In nodes

        Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray

        dtTable.Rows.Add(Row)

    Next

    dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)

    End Sub

After that import the file to Excel

Read this Article to import XML into excel

Hope it helps

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜