Create csv from html pages

2023-03-18 00:26 问答作者：

There is a website that displays a lot of data in html tables. They have page开发者_如何学运维d the data so there are around 500 pages.

What is the most convenint (easy) way of getting the data in those tables and download it a CSV, on Windows?

Basically I need to write a script that does something like this but is overkilling to write in in C# and I am looking for other solutions that people with web experience use:

for(i=1 to 500)
   load page from http://x/page_i.html;
   parse the source and get the data in table with id='data'
   save results in csv

Thanks!

I was doing a screen-scraping application once and found BeautifulSoup to be very useful. You could easily plop that into a Python script and parse across all the tags with the specific id you're looking for.

The easiest non-C# way I can think of is to use Wget to download the page, then run HTMLTidy to convert it to XML/XHTML and then transform the resulting XML to CSV with an XSLT (run with MSXSL.exe)

You will have to write some simple batch files and an XSLT with a basic XPath selector.

If you feel it would be easier to just do it in C#, you can use SgmlReader to read the HTML DOM and do an XPath query to extract the data. It should not take more than about 20 lines of code.

继续阅读：html-parsing parsing

Create csv from html pages

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？