How can I get html content from a browser that can do the html correction and js scripting?
I need a solution for getting HTML content from a browser. As rendering in 开发者_Python百科a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work. I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end. Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok.
I need to use Python or java to do that.
selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!
For a detailed example of use with Python, see here.
精彩评论