Automatically saving web pages requiring login/HTTPS

2023-02-07 01:23 问答作者：

I'm trying to automate some datascraping from a website. However, because the user has to go through a login screen a wget cronjob won't work, and because I need to make an HTTPS request, a simple Perl script won't work either.开发者_如何学运维 I've tried looking at the "DejaClick" addon for Firefox to simply replay a series of browser events (logging into the website, navigating to where the interesting data is, downloading the page, etc.), but the addon's developers for some reason didn't include saving pages as a feature.

Is there any quick way of accomplishing what I'm trying to do here?

A while back I used mechanize wwwsearch.sourceforge.net/mechanize and found it very helpful. It supports urllib2 so it should also work with HTTPS requests as I read now. So my comment above could hopefully prove wrong.

You can record your action with IRobotSoft web scraper. See demo here: http://irobotsoft.com/help/

Then use saveFile(filename, TargetPage) function to save the target page.

继续阅读：automation browser encryption https screen-scraping

Automatically saving web pages requiring login/HTTPS

更多精彩内容

精彩评论

最新问答

魔兽世界传家宝怎么获得?？

家用投影仪价格一般多少啊?？

小米电视只能看小窗口看不能放大？

王者荣耀赛季什么时候结束s37?？

精子成活率低的原因？

问答排行榜

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

怀孕10周担心胎停育？