开发者

Concise way to Disallow Spidering of all Directories with exceptions [closed]

Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 10 years ago.

Improve this question

Is there anyway to write a robots.txt file that forbids indexing of all content except for specified directories?

Currently disallow is the only valid way which means I need to explicitly specify which directories I do not keep开发者_开发百科 out - however I'd rather not announce these directories to the world...

Has anyone tackled this problem?


There isn't really a good solution. You can, as you said, disallow everything, which announces things to the world.

  • If you're not tied to the current url structure, you could consider creating an "allowed" directory, and then symlink your desired content into there. Then you only have to disallow your top level directories.

  • Alternatively, you could build some kind of server-side filter for bot user agents. Allow the major ones in your robots.txt, and then filter their access server-side with an appropriate response code, while blocking all others. This is probably a worse solution than my other option, but it retains your canonical urls.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜