Google crawler finds robots.txt, but can't download it
Can anyone tell me what's wrong with this robots.txt?
http://bizup.cloudapp.net/robots.txt
The following is the error I get in Google Webmaster Tools:
Sitemap errors and warnings
Line Status Details
Errors -
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but wer开发者_StackOverflow社区e unable to download it. Please ensure that it is accessible or remove
it completely.
Actually the link above is the mapping of a route that goes an action Robots. That action gets the file from the storage and returns the content as text/plain. Google says that they can't download the file. Is it because of that?
It looks like it's reading robots.txt OK, but your robots.txt then claims that http://bizup.cloudapp.net/robots.txt is also the URL of your XML sitemap, when it's really http://bizup.cloudapp.net/sitemap.xml. The error seems to come from Google trying to parse robots.txt as an XML sitemap. You need to change your robots.txt to
User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml
EDIT
It actually goes a bit deeper than that, and Googlebot can't download any pages at all on your site. Here's the exception being returned when Googlebot requests either robots.txt or the homepage:
Cookieless Forms Authentication is not supported for this application.
Exception Details: System.Web.HttpException: Cookieless Forms Authentication is not supported for this application.
[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266
FormsAuthentication is trying to use cookieless mode because it recognises that Googlebot doesn't support cookies, but something in your FormsAuthentication_OnAuthenticate method is then throwing an exception because it doesn't want to accept cookieless authentication.
I think that the easiest way around that is to change the following in web.config, which stops FormsAuthentication from ever trying to use cookieless mode...
<authentication mode="Forms">
<forms cookieless="UseCookies" ...>
...
I fixed this problem in a simple way: just by adding a robot.txt file (in the same directory as my index.html file), to allow all access. I had left it out, intending to allow all access that way -- but maybe Google Webmaster Tools then located another robot.txt controlled by my ISP?
So it seems that for some ISPs at least, you should have a robot.txt file even if you don't want to exclude any bots, just to prevent this possible glitch.
There is something wrong with the script that is generating the robots.txt file. When GoogleBot is accessing the file it is getting 500 Internal Server Error
. Here are the results of the header check:
REQUESTING: http://bizup.cloudapp.net/robots.txt GET /robots.txt HTTP/1.1 Connection: Keep-Alive Keep-Alive: 300 Accept:*/* Host: bizup.cloudapp.net Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) SERVER RESPONSE: 500 INTERNAL SERVER ERROR Cache-Control: private Content-Type: text/html; charset=utf-8 Server: Microsoft-IIS/7.0 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET Date: Thu, 19 Aug 2010 16:52:09 GMT Content-Length: 4228 Final Destination Page
You can test the headers here http://www.seoconsultants.com/tools/headers/#Report
I have no problem to get your robots.txt
User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt
However isn't it performing a recursive robots.txt call?
A Sitemap is supposed to be a xml file, see Wikipedia
精彩评论