开发者

robots.txt and wildcard at the end od disallow [closed]

Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 10 years ago.

Improve this question

I need to disallow indexing 2 pages, one of them dynamic:

site.com/news.php

site.com/news.php?id=__

site.com/news-all.php

What should I write in robots.txt:

User-agent: *  
Disallow: /news 

or

Disallow: /news* 

or

Disallow: /news.php*  
Disallow: /news-all开发者_StackOverflow.php 

Should one use wildcard in the end or not?


User-agent: *
Disallow: /news.php?id=*

User-agent: *
Disallow: /news-all.php

More info here

EDIT:

The first rule will allow news.php with parameters but allow news.php without ?id=__. If you do not want to crawl news.php that you have to use /news.php*


The Allow and Disallow lines in robots.txt say, "allow (or disallow) anything that starts with".

So:

Disallow: /news.php

is the same as

Disallow: /news.php*

Provided, of course, that the bot reading robots.txt understands wildcards. If the bot doesn't understand wildcards, then it will treat the asterisk as a part of the actual file name.

An asterisk at the end of the line is superfluous, and potentially hazardous.


For sure

Disallow: /news.php
Disallow: /news-all.php

is correct. No stars are needed if you have the full filename. It is though interesting for me wheather the

Disallow: /news*

approach can work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜