开发者

web scraping search results

I need help solving the following issue:

I need to validate cached URLs by Google search engine for a particular site. In the case the url will 404 or the page will not render some necessary html elements (considered broken) I need to log those URLs and later 301 redirect to correct URLs. I know PHP and a little 开发者_开发问答bit of Python but I'm not sure what approach to use to scrap all URLs from search engine results for given site.


http://simplehtmldom.sourceforge.net/ - a simple html parser. there is an example at this page; not sure if this still works with googles instant search etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜