Dumping a dynamic web page to file?
I'm a C++ programmer and I'm new to web development. I need to figure out how I can log/dump the html of a dynamic 3rd party website to a static html file on my computer, every second? The dynamic webpage refreshes every second and updates a html table with latest price info. I would like a static snapshot of this table (or the whole html page) to be saved to disk every second. That way I开发者_如何学运维 can parse the file with my own program and add the updated price info to a database. How do I do this? If I cant do it this way, is there a way to eves drop (and log) on the the post/get messages and replies the dynamic webpage sends?
Look into the cURL Library. I believe Scraping the content from a website, and doing your processing/business logic, then inserting or updating your database would be the most efficient way to do it, rather than saving the files contents to disk.
Alternatively, file_get_contents() works pretty well assuming you have allow_url_fopen enabled.
It would be easy to do with Selenium Webdriver. You can use Selenium to create a browser object with a method, getPageSource, that pulls the entire HTML from the page, but it doesn't seem there are any C++ bindings for Selenium. If it's convenient to use Ruby, Python, or Java as part of your application, just in order to open up a browser or headless browser and pull the data, then you should be able to set up a web service or a local file to transfer that data back into your C++ application.
Web automation from C++ addresses the challenge of no Selenium C++ bindings
Or, alternately you could write your own C++ bindings for Selenium (probably more difficult)
However -- for simply pulling the HTML, you may not need Selenium if one of Dan's answers above will work.
Hej someone else.
insed of running there page every second to record there data so you can have a updated view of there prices, why not call there web service directly (the one there ajax call makes)
Gl
精彩评论