开发者

How to verify if the sitemap generated indexes return 200 code?

I have generated the Sitemap indexes for Google. The only issue which I have is that how to verify that all the indexes(URL's) which got generated work or not. Based on the guide it says something like this:

you write a script to test each URL in the 开发者_JS百科sitemap against your application server and confirm that each link returns an HTTP 200 (OK) code. Broken links may indicate a mismatch between the URL formatting configuration of the Sitemap Generator

I just wanted to see if somebody had such experience on how to write such script?


google webmaster tools will report you within "site configuration -> sitemaps" any HTTP errors and redirects (pretty much everything that is not an HTTP 200), additionally in the "Diagnostics -> Crawl Errors -> in Sitemaps" is another view of errors that occured while crawling urls that were listed within the sitemaps.

if that is not what you want, i would just do some logfile grep-ing. (grep for "googlebot" and an identifier of the urls that you listed within your sitemaps)

you could propably write your own crawler to pre-check if your sites return an HTTP 200, but well, if it returns an HTTP 200 for you now, does not mean it will return an HTTP 200 for googlebot next week / month / year. so i recommend to stick with google webmaster tools and logfile analysis (visualized with i.e.: munin, cacti, ...)


How did you create the sitemap? I would think most sitemap tools would only include URLs that responded with "200 OK"

Do note that some websites mess up and always respond with response 200 instead of e.g. 404 for invalid URLs. Such websites have trouble ahead :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜