开发者

Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)

I have:

  • domain.com
  • testing.domain.com

I want domain.com to be crawled and indexed by searc开发者_开发问答h engines, but not testing.domain.com

The testing domain and main domain share the same SVN repository, so I'm not sure if separate robots.txt files would work...


1) Create separate robots.txt file (name it robots_testing.txt, for example).

2) Add this rule into your .htaccess in website root folder:

RewriteCond %{HTTP_HOST} =testing.example.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

It will rewrite (internal redirect) any request for robots.txt to robots_testing.txt IF domain name = testing.example.com.

Alternatively, do opposite -- rewrite all requests for robots.txt to robots_disabled.txt for all domains except example.com:

RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^robots\.txt$ /robots_disabled.txt [L]


testing.domain.com should have it own robots.txt file as follows

User-agent: *
Disallow: /

User-agent: Googlebot
Noindex: /

located at http://testing.domain.com/robots.txt
This will disallow all bot user-agents and as google looks at the Noindex as well we'll just though it in for good measure.

You could also add your sub domain to webmaster tools - block by robots.txt and submit a site removal (though this will be for google only). For some more info have a look at http://googlewebmastercentral.blogspot.com/2010/03/url-removal-explained-part-i-urls.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜