开发者

Google crawler finds robots.txt, but can't download it

Can anyone tell me what's wrong with this robots.txt?

http://bizup.cloudapp.net/robots.txt

The following is the error I get in Google Webmaster Tools:

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but wer开发者_StackOverflow社区e unable to download it. Please ensure that it is accessible or remove
it completely.

Actually the link above is the mapping of a route that goes an action Robots. That action gets the file from the storage and returns the content as text/plain. Google says that they can't download the file. Is it because of that?


It looks like it's reading robots.txt OK, but your robots.txt then claims that http://bizup.cloudapp.net/robots.txt is also the URL of your XML sitemap, when it's really http://bizup.cloudapp.net/sitemap.xml. The error seems to come from Google trying to parse robots.txt as an XML sitemap. You need to change your robots.txt to

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

EDIT

It actually goes a bit deeper than that, and Googlebot can't download any pages at all on your site. Here's the exception being returned when Googlebot requests either robots.txt or the homepage:

Cookieless Forms Authentication is not supported for this application.

Exception Details: System.Web.HttpException: Cookieless Forms Authentication is not supported for this application.

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication is trying to use cookieless mode because it recognises that Googlebot doesn't support cookies, but something in your FormsAuthentication_OnAuthenticate method is then throwing an exception because it doesn't want to accept cookieless authentication.

I think that the easiest way around that is to change the following in web.config, which stops FormsAuthentication from ever trying to use cookieless mode...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...


I fixed this problem in a simple way: just by adding a robot.txt file (in the same directory as my index.html file), to allow all access. I had left it out, intending to allow all access that way -- but maybe Google Webmaster Tools then located another robot.txt controlled by my ISP?

So it seems that for some ISPs at least, you should have a robot.txt file even if you don't want to exclude any bots, just to prevent this possible glitch.


There is something wrong with the script that is generating the robots.txt file. When GoogleBot is accessing the file it is getting 500 Internal Server Error. Here are the results of the header check:

REQUESTING: http://bizup.cloudapp.net/robots.txt
GET /robots.txt HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: bizup.cloudapp.net
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 500 INTERNAL SERVER ERROR
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 19 Aug 2010 16:52:09 GMT
Content-Length: 4228
Final Destination Page

You can test the headers here http://www.seoconsultants.com/tools/headers/#Report


I have no problem to get your robots.txt

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

However isn't it performing a recursive robots.txt call?

A Sitemap is supposed to be a xml file, see Wikipedia

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜