开发者

robots.txt and relative path

I want to disallow any files in any /tmp folder on 开发者_开发知识库my site. e.g. I have: "/anything/tmp/whatever/test.html", "/stuff/tmp/old/test.html", "/people/tmp/images.html", and so on.

Is it enough to put disallow /tmp/ into my robots.txt to block any tmp folder in the whole file system of my webserver? Or do I need to put every single path like: disallow /anything/tmp/ disallow /stuff/tmp/ disallow /tmp/

Or like this: disallow /*/tmp/

Thanks


Straight answer: NO

You'll have to declare each directory you want to exclude from robots.

User-agent: *
Disallow: /anything/tmp/
Disallow: /stuff/tmp/

You can check the syntax of your robots.txt file @ http://www.frobee.com/robots-txt-check
Read more about Robot Exclusion @ http://www.robotstxt.org/orig.html


It actually depends on the REP parser. More advanced parsers do recognize wildcard syntax, but it's not part of the original spec.

That said, Google does honor wildcards. According to their parser:

/fish*.php
Does Match:
    /fish.php 
    /fishheads/catfish.php?parameters
Does Not Match
    /Fish.PHP
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜