Download webpage google web toolkit
The company I work for is switching its front end to a gwt application and I was wondering 开发者_如何学运维if it is possible to write a script (whether with bash and wget or cURL, or java or anything) that enables me to download the actual content of the gwt web application. Because right now if I try with a command such as wget I just download a page with some javascript functions, but none of the actual page content (what I am interested in). I am on the QA side so I guess I am wondering if it is possible to perform such a task without having direct access to the developers code. Thanks!
GWT builds the the page (DOM) in place with javascript. So yo would need something that renders the initial DOM, runs the javascript that alters/produces elements and then output the whole DOM. Basically you need a browser.
Your best option would be to look for a browser extension that saves whole pages.
Here is some general background on crawlability in AJAX applications.
http://code.google.com/web/ajaxcrawling/docs/getting-started.html
Here is code for a sample servlet that implements that crawlability spec by feeding a page into HTMLUnit , causing all the HTML to be rendered, then sending the results back to the web crawler.
http://code.google.com/p/google-web-toolkit/source/browse/branches/crawlability/samples/showcase/src/com/google/gwt/sample/showcase/server/CrawlServlet.java?r=6211
I found a solution using a tool called selenium. I am able to easily click through the gwt application record my activity within the application for future use, and get the actual html generated by the application which I can then parse for desired content and act accordingly. The only small drawback is that selenium does require the use of a browser, unlike htmlunit or httpunit.
精彩评论