robots.txt and wildcard at the end od disallow [closed]
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this questionI need to disallow indexing 2 pages, one of them dynamic:
site.com/news.php site.com/news.php?id=__ site.com/news-all.phpWhat should I write in robots.txt:
User-agent: *
Disallow: /news
or
Disallow: /news*
or
Disallow: /news.php*
Disallow: /news-all开发者_StackOverflow.php
Should one use wildcard in the end or not?
User-agent: *
Disallow: /news.php?id=*
User-agent: *
Disallow: /news-all.php
More info here
EDIT:
The first rule will allow news.php with parameters but allow news.php without ?id=__. If you do not want to crawl news.php that you have to use /news.php*
The Allow and Disallow lines in robots.txt say, "allow (or disallow) anything that starts with".
So:
Disallow: /news.php
is the same as
Disallow: /news.php*
Provided, of course, that the bot reading robots.txt understands wildcards. If the bot doesn't understand wildcards, then it will treat the asterisk as a part of the actual file name.
An asterisk at the end of the line is superfluous, and potentially hazardous.
For sure
Disallow: /news.php
Disallow: /news-all.php
is correct. No stars are needed if you have the full filename. It is though interesting for me wheather the
Disallow: /news*
approach can work.
精彩评论