开发者

Help needed on web spider

I am writing a very basic web spider in java.I am facing one problem, that content loaded for same url is different than that in browser.For example try below URL.

http://www.google.co.in/search?sourceid=chrome&ie=UTF-8&q=web+spider#sclient=psy&hl=en&source=hp&q=web+spider&aq=f&aqi=&aql=&oq=web+spider&pbx=1&fp=d8e8e41d6d2bda33&biw=1366&bih=643

If you load this url in browser, and through JAVA URL class, the contents are different.This may be because of the following reasons.

  • Javascript may be sending

    XMLHTTPrequests and concatenating the result to render final HTML.

  • URL redirects may finally render the HTML.
  • Any other reasons, that I dont know.

So is there a way that I simulate brow开发者_StackOverflow中文版ser in my java program.Are There any third party libraries, that loads the page similar to what browser does and finally return the content.Any help is appreciated.


try htmlunit it can emulate browser behaviour and handle javascript

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜