Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)
I have:
- domain.com
- testing.domain.com
I want domain.com to be crawled and indexed by searc开发者_开发问答h engines, but not testing.domain.com
The testing domain and main domain share the same SVN repository, so I'm not sure if separate robots.txt files would work...
1) Create separate robots.txt file (name it robots_testing.txt, for example).
2) Add this rule into your .htaccess in website root folder:
RewriteCond %{HTTP_HOST} =testing.example.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]
It will rewrite (internal redirect) any request for robots.txt
to robots_testing.txt
IF domain name = testing.example.com
.
Alternatively, do opposite -- rewrite all requests for robots.txt
to robots_disabled.txt
for all domains except example.com
:
RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^robots\.txt$ /robots_disabled.txt [L]
testing.domain.com should have it own robots.txt file as follows
User-agent: *
Disallow: /
User-agent: Googlebot
Noindex: /
located at http://testing.domain.com/robots.txt
This will disallow all bot user-agents and as google looks at the Noindex as well we'll just though it in for good measure.
You could also add your sub domain to webmaster tools - block by robots.txt and submit a site removal (though this will be for google only). For some more info have a look at http://googlewebmastercentral.blogspot.com/2010/03/url-removal-explained-part-i-urls.html
精彩评论