开发者

Accessing html generated by Javascript with htmlunit -Java

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.

WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.prin开发者_运维知识库tln(Source);

This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?


You gotta give some time for the JavaScript to execute.

Check a sample working code below. The bucket divs aren't in the original source.

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class GetPageSourceAfterJS {
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
        WebClient webClient = new WebClient();
        String url = "http://www.futurebazaar.com/categories/Home--Living-Luggage--Travel-Airbags--Duffel-bags/cid-CU00089575.aspx";
        System.out.println("Loading page now: "+url);
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */

        String pageAsXml = page.asXml();
        System.out.println("Contains bucket? --> "+pageAsXml.contains("bucket"));

        //get divs which have a 'class' attribute of 'bucket'
        List<?> buckets = page.getByXPath("//div[@class='bucket']");
        System.out.println("Found "+buckets.size()+" 'bucket' divs.");

        //System.out.println("#FULL source after JavaScript execution:\n "+pageAsXml);
    }
}

Output:

Loading page now: http://www.futurebazaar.com/categories/Mobiles-Mobile-Phones/cid-CU00089697.asp‌​x?Rfs=brandZZFly001PYXQcurtrayZZBrand
Contains bucket? --> true
Found 3 'bucket' divs.

HtmlUnit version used:

<dependency>
    <groupId>net.sourceforge.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>2.12</version>
</dependency>


Assuming the issue is HTML generated by JavaScript as a result of AJAX calls, have you tried the 'AJAX does not work' section in the HtmlUnit FAQ?

There's also a section in the howtos about how to use HtmlUnit with JavaScript.

If your question isn't answered here, I think we'll need some more specifics to be able to help.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜