Copy html table in Excel
I must write a program that periodically reads a web page and copies certain data from a table on that page to an Excel spreadsheet. I don't know where to start or what programming language is suitable for this project. I know a little C++ and Matlab programming. Can anyone offer advice to point me in the right direction or suggest open source projects which do something similar?
I can use wget(linux) or fget1(matlab) to download the webpages, but I don't know how can I save certain data from sou开发者_如何学JAVArce of this webpages into Excel.
I will assume you have room for learning C#. Since you have to extract the table from a web page, you need a special library/framework to deal with web browsing such as Watin. After getting the table, it's matter of saving into Excel spreadsheet. For convenience, you can write a CSV format (comma separated text) and excel can open the file. Hope it helps
I used the following code vb.net to parse multiple html table from a saved web page to a datatable (the table must have the same structure) (using Html-Agility-Pack) and save it to Xml file:
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
After that import the file to Excel
Read this Article to import XML into excel
Hope it helps
精彩评论