robots.txt disallow: spider
I'm looking at a robots.txt file of a site I would like to do a one off scrape and there is this line:
User-agent: spider
Disallow: /
Does this mean they don't want any spiders? I was under the impression that * was used for all spiders. If tru开发者_如何转开发e this would of-course stop spiders such as google.
This just tells to agents that call themselves spider
to be gently enough to not browse the site.
This has no special meaning.
robots.txt files are used only by robots, so a way to exclude all robots is to use a *
:
User-Agent: *
Disallow: /
精彩评论