开发者

Log Into Website and Scrape Streaming Data

I am not really a programmer but am asking this out of general curiosity. I visited a website recently where I logged in, went to a page, and without leaving, data on that pag开发者_开发知识库e refreshes before my eyes.

Is it possible to mimic a browser (I was using Chrome) and log into the site, navigate to a page, and "scrape" that data that is coming in using Python? I would like to store and analyze it.

If so, taking this one step further, is it possible to interact with the website? Click a button that I know the name of?

Thanks in advance.


If the data "refreshes before your eyes" it is probably AJAX (javascript in the page pulling new page-data from the server).

There are two ways of approaching this;

  1. using Selenium you can wrap an actual browser which will load the page, run the javascript, then you can grab page-bits from the active page.

  2. you can look at what the AJAX in the page is doing (how it is asking for updates, what it is getting back) and write python code to emulate that.

both take a fair bit of of time and effort to set up; Selenium is a bit more robust, direct python queries is a bit more efficient, YMMV.


To emulate the browser behavior in Python, you can use the mechanize module. The 'streaming' data which you refer to could be flash or javascript. If it is flash, it is going to be binary and you won't be able fetch it. If it javascript, mehanize again seems to have problems in dealing with that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜