Do google bots make invalid requests?
I'm building a component to ban spam bots' IPs based on the invalid requests that they make all the time, and that no user could ever m开发者_StackOverflow社区ake by mistake.
For example, they are always trying to submit empty forms, or making GET requests to urls that should only receive POST requests.
What I want to know is if I am at risk of banning google bots by doing so.
Are they smart enough not to crawl every url they encounter? Do they avoid form urls?
Googlebot follows links. It will only request pages for which it finds a link. Of course, that link doesn't have to reside on your site and so may not be in your direct control.
Googlebot will only make GET requests because, according to the RFC, GET requests must not have side-effects. Thus, they cannot change state on the server. Hint: Never use a link (i.e. "get") to perform or confirm some change to your site or any web spider might trigger it.
Every CGI you have that changes the state of your site should verify that the incoming request is indeed a POST, just to be safe.
Googlebot does make invalid requests. I have found some requests made with a “From:” header that does not contain an “@“ sign in the mailbox name the header specifies. Other bots also sometimes do this. Therefore, watch for invalid optional header data in requests.
精彩评论