开发者

How to retrieve google pages

Dear all,I am now using a webtool

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=

to parse a webpage.

For example,we can parse newyork开发者_C百科times homepage,we do:

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://www.nytimes.com/pages/world/index.html

in the address bar of our browser,it will parse things nicely for us.

However,it just fails for google pages. For example,if I want to parse Google news headpage,like:

http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://news.google.com/nwshp?hl=en&tab=wn

I will always get 500 Internal Server Error.

I am sure that is somthing to do with google website,I think probably we need some API for google,does anyone have any idea how to to sort this out for google pages? Many thanks.


Per the google.com robots.txt file, you are explictly requested not to scrape their content. Google does not provide an API for machine-readable search results; they want to control the presentation of their content via widgets and embedding strategies.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜