开发者

Blocking to be indexed

I am wondering is there a开发者_JAVA百科ny (programming) way to block that any search engine indexes the content of a website.


You can specify it in robots.txt

User-agent: *
Disallow: /


As the other answers already say, Robots.txt is the standard that every proper search engine adheres to. This should be enough in most cases.

If you really want try to programmatically block malicious bots that do not listen to robots.txt, check out this question I asked a few months ago on how to tell bots apart from human visitors. You may find some good starting points there.


Create a robots.txt file for your site. For more info - see this link.


Most search engine bots identify themselves using a unique user agent.

You can block specific user agents using robots.txt

Here is a list of some user agents.


Since you did not mention programming language, I'll give my input on this as from a php perspective - there is a wordpress plugin called bad behavior, which does exactly what you are looking for, it is configurable via a code script listing an array of search agent's strings. And based on what the agent is crawling on your site, the plugin automatically checks the user-agent's string and id, or IP address and based on the array, if there's a match, it either rejects or accepts the agent.

It might be worth your while to have a peek at the code to see how is it done from a programmer's perspective of the code.

If the language is other than php, and not satisfy what you are looking for, then I apologize for posting this answer.

Hope this helps, Best regards, Tom.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜