Help needed on web spider
I am writing a very basic web spider in java.I am facing one problem, that content loaded for same url is different than that in browser.For example try below URL.
http://www.google.co.in/search?sourceid=chrome&ie=UTF-8&q=web+spider#sclient=psy&hl=en&source=hp&q=web+spider&aq=f&aqi=&aql=&oq=web+spider&pbx=1&fp=d8e8e41d6d2bda33&biw=1366&bih=643
If you load this url in browser, and through JAVA URL class, the contents are different.This may be because of the following reasons.
- Javascript may be sending XMLHTTPrequests and concatenating the result to render final HTML.
- URL redirects may finally render the HTML.
- Any other reasons, that I dont know.
So is there a way that I simulate brow开发者_StackOverflow中文版ser in my java program.Are There any third party libraries, that loads the page similar to what browser does and finally return the content.Any help is appreciated.
try htmlunit it can emulate browser behaviour and handle javascript
精彩评论