Concise way to Disallow Spidering of all Directories with exceptions [closed]
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this questionIs there anyway to write a robots.txt file that forbids indexing of all content except for specified directories?
Currently disallow is the only valid way which means I need to explicitly specify which directories I do not keep开发者_开发百科 out - however I'd rather not announce these directories to the world...
Has anyone tackled this problem?
There isn't really a good solution. You can, as you said, disallow everything, which announces things to the world.
If you're not tied to the current url structure, you could consider creating an "allowed" directory, and then symlink your desired content into there. Then you only have to disallow your top level directories.
Alternatively, you could build some kind of server-side filter for bot user agents. Allow the major ones in your
robots.txt
, and then filter their access server-side with an appropriate response code, while blocking all others. This is probably a worse solution than my other option, but it retains your canonical urls.
精彩评论