Automated browsing of complicated web pages
I have a project that will involve heavy automation of complicated web pages.
I realize there are Mechanize and Beautiful Soup, but don't these break when dealing with large amounts of DOM scripting and other weird stuff you find on complicated web pages?
I think I want essentially a barebones running instance of WebKit that allows me to either do "GUI script开发者_C百科ing" or access the DOM. Ideas?
Try Sahi with PhantomJS. Sahi is a browser automation tool, and PhantomJS is a headless Webkit browser. You can find set-up instructions here: http://sahi.co.in/w/sahi-headless-execution-with-phantomjs
Disclaimer: We created the Sahi product.
What platform are you working on? And what language do you intend to use?
Adobe Air let's you embed a webkit inside an Air application and interact with the page JavaScript (there is two-way communication between the page JS and the AIR runtime).
Otherwise, if you are not bound to webkit you could take Mozilla Chromeless for a spin.
My apologies if none of this does what you need to do, I can't quite figure what exactly you are trying to do (page scraping? submitting forms?).
For testing/scraping i would try:
- Selenium
- EnvJS
- Windmill
- Watir
- Sahi
- WebTest
精彩评论