robots.txt and relative path
I want to disallow any files in any /tmp folder on 开发者_开发知识库my site. e.g. I have: "/anything/tmp/whatever/test.html", "/stuff/tmp/old/test.html", "/people/tmp/images.html", and so on.
Is it enough to put disallow /tmp/ into my robots.txt to block any tmp folder in the whole file system of my webserver? Or do I need to put every single path like: disallow /anything/tmp/ disallow /stuff/tmp/ disallow /tmp/
Or like this: disallow /*/tmp/
Thanks
Straight answer: NO
You'll have to declare each directory you want to exclude from robots.
User-agent: *
Disallow: /anything/tmp/
Disallow: /stuff/tmp/
You can check the syntax of your robots.txt file @ http://www.frobee.com/robots-txt-check
Read more about Robot Exclusion @ http://www.robotstxt.org/orig.html
It actually depends on the REP parser. More advanced parsers do recognize wildcard syntax, but it's not part of the original spec.
That said, Google does honor wildcards. According to their parser:
/fish*.php
Does Match:
/fish.php
/fishheads/catfish.php?parameters
Does Not Match
/Fish.PHP
精彩评论