开发者

Impossible site for HtmlUnit?

I cannot, for the life of me, rig HtmlUnit up to grab this site:

http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+leave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true

I'm sure it has to do with the vast amounts of scripts running in the background. Perhaps these scripts aren't being given enough time to fully load?

I've also tried simply grabbing bing.com/travel, and no success either. It's breaking on the getPage function of the new HtmlPage client.

The output gives a plethora of runtimeErrors ("data necessary to complete this operation is not yet available"), all for the same sourceName ("http://www.bing.com/travel/jsxc.vjs?a=common&v=5.5.0-1278007084280")

Then a couple exceptions thrown for a missing "(" in a couple scripts on bing.com.

Then it calls javascript, then abruptly ends.

I realize this could be a handful of problems that others might not be able to see, and so if there are no suggestions, would someone mind pumping these two sites through a test implementation of their own HtmlUnit use and see if they can get basic output of the XML or text results? I'm not trying to do anything fancy here, just get some basic text or XML output of the results.

It'd be handy to know if someone else's implementation works so I can keep jury-rigging mine to completion.

CODE:

import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.WebClient;

public class test {

public static void main(String[] args) throws Exception {

        WebClient client = new WebClient();
        System.out.println("webclient loaded");

        HtmlPage currentPage = client.getPage("http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+le开发者_开发问答ave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true");
        client.waitForBackgroundJavaScript(10000);
        System.out.println("htmlpage init'd");

        //System.out.println(currentPage.getTitleText());
        String textSource = currentPage.asXml();
        System.out.println(textSource);

}

}

Thanks!


Try adding this:

client.setThrowExceptionOnScriptError( false ) ;

It takes a long time to run, and boy does it spew out logging... but eventually a page came out:

htmlpage init'd
<?xml version="1.0" encoding="utf-8"?>
<html id="">
  <head>
   ...


I also had the problem with "data necessary to complete this operation is not yet available".
Switching the user-agent to "Firefox" helped...
http://steveliles.github.com/jquery_htmlunit_runtimeerror_messages_galore.html


Browsers have a high tolerance for what they might detect as errors (in Javascript, but also HTML, css and so on). This is partly because of various conflicting "standards" :) of how Javascript got implemented. Something that appears OK on one browser gets problems on another. So when all these messages are made visible it should be a little disconcerting.

To put this in perspective - in Internet Explorer go into your settings and check the "Advanced Settings" for "Display a notification about every script error" and then browse the same sites. You might be surprised at how much code IE gets by just ignoring what it might detect as problems.

Using HtmlUnit under various browsers just brings some of these conflicts to light.

Telling HtmlUnit to do something like "Ignore...for this browser" is a perfectly valid practice. In my case, I am bringing in data from a site that checks that all the users are using Internet Explorer (No, I have no good idea why they do that), so I can't proceed without ignoring the javascript errors. Interestingly, the site works fine even though IE thinks there're lots of Javascript errors.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜