filtering externally loaded javascript in htmlunit
While using htmlunit to scrape a webpage, I occasionally开发者_JAVA百科 notice warnings like these that flood the console output.
Jul 24, 2011 5:12:59 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter warning
WARNING: warning: message=[Calling eval() with anything other than a primitive string value
will simply return the value. Is this what you intended?] sourceName=[http://ad.doubleclick.net/adj/N5762.morningstar.com/B5553006.25;sz=728x90;click0=http://ads.morningstar.com/RealMedia/ads/click_lx.ads/www.morningstar.com/quicktake/fund/L34/648978540/TopLeft/Morningstar/JPM_FRpt_728x90_Jul_3827448/Fund_Reports_728x90_content.html/656d5477595534723465554144664a2b?;ord=648978540?] line=[356] lineSource=[null] lineOffset=[0]
Is there a way that I can have htmlunit ignore javascript from
- http://ad.*
- http://ads.*
or even just
- http://ad.doubleclick.net
- http://ads.morningstar.com
Likewise, is there a way to have htmlunit only interpret the javascript on a webpage containing a particular substring or matching a regex?
You might be able to remove the unwanted javascript by implementing your own ScriptPreProcessor. Your ScriptPreProcessor could detect the jsvascript you do not want to execute and than remove it from the web site.
I have not tried it yet, but might work.
精彩评论