开发者

filtering externally loaded javascript in htmlunit

While using htmlunit to scrape a webpage, I occasionally开发者_JAVA百科 notice warnings like these that flood the console output.

Jul 24, 2011 5:12:59 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter warning
WARNING: warning: message=[Calling eval() with anything other than a primitive string value 
will simply return the value. Is this what you intended?] sourceName=[http://ad.doubleclick.net/adj/N5762.morningstar.com/B5553006.25;sz=728x90;click0=http://ads.morningstar.com/RealMedia/ads/click_lx.ads/www.morningstar.com/quicktake/fund/L34/648978540/TopLeft/Morningstar/JPM_FRpt_728x90_Jul_3827448/Fund_Reports_728x90_content.html/656d5477595534723465554144664a2b?;ord=648978540?] line=[356] lineSource=[null] lineOffset=[0]

Is there a way that I can have htmlunit ignore javascript from

  • http://ad.*
  • http://ads.*

or even just

  • http://ad.doubleclick.net
  • http://ads.morningstar.com

Likewise, is there a way to have htmlunit only interpret the javascript on a webpage containing a particular substring or matching a regex?


You might be able to remove the unwanted javascript by implementing your own ScriptPreProcessor. Your ScriptPreProcessor could detect the jsvascript you do not want to execute and than remove it from the web site.

I have not tried it yet, but might work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜