How to D/L a web site and process its contents (but only after JS has manipulated the DOM)?
Server-s开发者_如何学Goide, I would like to download a remote web site using Curl, then use PHP to parse out specific parts of the page. Easy stuff, right? The only hitch is, before I begin parsing the page, I need to wait until after some JavaScript manipulation has occurred within the DOM.
Is there a way to make this happen?
I suppose what I need is some sort of server-side app / browser that can be run solely from the command line and that is capable of executing JavaScript.
I've never done this and am at a loss. Surely it is possible?
You might want to look into the Selenium library. I have only used it in Java, but I believe there is also a php version. There is also a separate firefox plugin (selenium ide) that is somewhat less robust than the library, but it may fit your needs. Selenium will take control of your browser (firefox, chrome, ie), and will allow you to get pieces of data using css/xpath selectors. Selenium is more geared towards large scale web app testing, but it can be used for other purposes. I have found it to be very useful because it allows you to access your site via code in the same manner a user would access it (i.e. Javascript/CSS are executed).
精彩评论