开发者

How to save a webpage by seleniumRC

I use seleniumRC to open a url, then how to save this web page? How to realize it like urllib.urlretrieve do it? But urllib can开发者_运维问答't operate javascript in the page. One more question: Will it save the whole page with what I see as seleniumRC open it?


It sounds like you are confusing two very different libraries.

urllib:

This module provides a high-level interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.

You can use python's urllib library to retrieve the raw markup from a valid URL. The library doesn't invoke any embedded javascript on the page, because the library never attempts to parse or render anything.

Selenium RC:

Selenium Remote Control (RC) is a test tool that allows you to write automated web application UI tests in any programming language against any HTTP website using any mainstream JavaScript-enabled browser.

Selenium RC is used to automate testing. Execution of your tests occurs in a web browser via javascript, but this is a testing suite — you receive information about the status of your tests. Selenium RC does not provide any functionality to save an image of the rendered page.


Unless I've misinterpreted your question, you seem to be looking for a library that will allow you to retrieve an image of a rendered HTML page (including javascript DOM manipulation). If this is indeed the case, I would suggest looking into PyWebShot, which seems to provide exactly that functionality. You can view screenshots of it in action here (along with some additional info about it).

If it doesn't necessarily need to be a python library, there are a number of web services around that provide screenshots:

  • IE Web Renderer
  • Browsershots
  • BrowsrCamp
  • BrowserCam
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜