开发者

Block RewriteRule in robots.txt

Here is an example RewriteRule from my .htaccess file:

RewriteRule ^ABC$ index.php?partner_id=123&utm_source=partner&utm_medium=link&utm_campaign=ABC [L]

So http://mywebsite.开发者_如何学编程com/123 would point to index.php?partner_id=123&utm_source=partner&utm_medium=link&utm_campaign=ABC

Index.php file is a very important page to be properly indexed by search engines, but I would like to block http://mywebsite.com/123 from being indexed without affecting http://mywebsite.com/ or http://mywebsite.com/index.php from being indexed.

Any help would be great.


If you want to block http://mywebsite.com/123, but allow http://mywebsite.com/123index.php, then you need an Allow and a Disallow:

User-agent: *
Allow: /123index.php
Disallow: /123

This will disallow anything that starts with /123, but specifically allow /123index.php.

Standard robots.txt syntax doesn't let you disallow specific URLs. Rather, it disallows URLs that start with the pattern that you specify.

Google and Bing (and some others) have some extensions to the standard syntax. Using Google's $ wildcard support, you could write:

Disallow: /123$

And that would block just that one URL. Other crawlers might or might not support that syntax.

Note in response to comment:

If I understand correctly, after your comment, you want to allow http://mywebsite.com/index.php, but block http://mywebsite.com/123. If you know there are no other resources that start with /123, then you can write:

Disallow: /123

That will block anything that starts with /123. For example, /123/file.html and /123abc. If there are other resources that start with /123 and you want to allow them, then you'll need:

Disallow: /123$

But understand that Google and maybe Bing will respect that wildcard. Many other crawlers won't.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜