Google crawler finds robots.txt, but can't download it

2023-01-12 20:12 问答作者：

Can anyone tell me what's wrong with this robots.txt?

http://bizup.cloudapp.net/robots.txt

The following is the error I get in Google Webmaster Tools:

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but wer开发者_StackOverflow社区e unable to download it. Please ensure that it is accessible or remove
it completely.

Actually the link above is the mapping of a route that goes an action Robots. That action gets the file from the storage and returns the content as text/plain. Google says that they can't download the file. Is it because of that?

It looks like it's reading robots.txt OK, but your robots.txt then claims that http://bizup.cloudapp.net/robots.txt is also the URL of your XML sitemap, when it's really http://bizup.cloudapp.net/sitemap.xml. The error seems to come from Google trying to parse robots.txt as an XML sitemap. You need to change your robots.txt to

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

EDIT

It actually goes a bit deeper than that, and Googlebot can't download any pages at all on your site. Here's the exception being returned when Googlebot requests either robots.txt or the homepage:

Cookieless Forms Authentication is not supported for this application.

Exception Details: System.Web.HttpException: Cookieless Forms Authentication is not supported for this application.

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication is trying to use cookieless mode because it recognises that Googlebot doesn't support cookies, but something in your FormsAuthentication_OnAuthenticate method is then throwing an exception because it doesn't want to accept cookieless authentication.

I think that the easiest way around that is to change the following in web.config, which stops FormsAuthentication from ever trying to use cookieless mode...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...

I fixed this problem in a simple way: just by adding a robot.txt file (in the same directory as my index.html file), to allow all access. I had left it out, intending to allow all access that way -- but maybe Google Webmaster Tools then located another robot.txt controlled by my ISP?

So it seems that for some ISPs at least, you should have a robot.txt file even if you don't want to exclude any bots, just to prevent this possible glitch.

There is something wrong with the script that is generating the robots.txt file. When GoogleBot is accessing the file it is getting 500 Internal Server Error. Here are the results of the header check:

REQUESTING: http://bizup.cloudapp.net/robots.txt
GET /robots.txt HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: bizup.cloudapp.net
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 500 INTERNAL SERVER ERROR
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 19 Aug 2010 16:52:09 GMT
Content-Length: 4228
Final Destination Page

You can test the headers here http://www.seoconsultants.com/tools/headers/#Report

I have no problem to get your robots.txt

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

However isn't it performing a recursive robots.txt call?

A Sitemap is supposed to be a xml file, see Wikipedia

继续阅读：search-engine search-engine-bots

Google crawler finds robots.txt, but can't download it

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？