How exactly does "timestamp analysis" deter spam?
This article on how much CAPTCHA sucks mentions that Animoto used timestamp analysis to cut down on spam.
It includes a link to a jQuery tutorial on timestamp analysis. Basically, yo开发者_如何学Cu use AJAX to have PHP set a cookie, use JS to add a hidden input to the form, and then (on submission) you compare the hidden input value with the cookie value. From the tutorial:
Checking the Form
test.php is the example PHP code used to verify the token
- Is the token [hidden input value] present?
- Does it match the timestamp when run through the md5() function?
- Has too much time elapsed?
...But it seemed really convoluted to me, for the following reasons:
- Is the token present? The token is only added by JavaScript, so all you're really doing is detecting whether or not JS is enabled. Surely there are easier ways to do this.
- Does it match the timestamp when run through the md5() function? The md5 might make us feel better, but isn't this just making sure that cookies are enabled? Surely there are easier ways to do this.
- Has too much time elapsed? Do spambots really take a long time to submit forms? Surely this is unnecessary. (Wouldn't you actually want to see if the form was submitted too soon?)
My hope is that I actually have no idea how or why bots interact with HTML forms, and that I can now be corrected and educated.
- Is the token present? Yes, you are pretty much seeing if JavaScript is enabled in the client. But the point behind it is that many web automation frameworks do not support JavaScript (or only support some limited subset of it), and the ones that do have proper JavaScript support tend to be fairly heavyweight and thus not suitable for use as a spam-bot. So basically you're filtering out simple spam-bots that rely on posting a form to a URL without actually evaluating anything on the page that contains the form.
The next two points seem to be guarding more against a spam-bot caching and reusing a form submit than against a given form submit taking too long after its enclosing page is loaded from the server. As you say, one would expect a spam-bot to be faster than an actual user at submitting a form, provided that the spam-bot follows the flow of requesting the form from your server and then submitting a response back. But not all spam-bots will follow that flow. Some might cache the page that your server sends (or the response that was generated for that page) for reuse over and over again. If they did that, then the timestamps/cookies give you a way to detect it.
But I really think the timestamps are unnecessary. I'd stick just with the token + JavaScript, using an approach roughly like:
- Each time the page/form is requested, the server generates a new, random token for that request.
- The token is associated with the user's current HTTP session.
- The token (or some lightly encrypted version of it) is sent to the page as well.
- JavaScript adds the token value as a hidden input to the form (decrypting it first, if necessary).
- On submission, the server checks to see if a) a token exists in the user's HTTP session, b) a token was submitted with the form, and c) both tokens match.
- Assuming the submission was valid, the server clears the token from the user's HTTP session so that it cannot be reused.
So all the explicit timestamp nonsense goes away, because that is built in to the HTTP session. Very old sessions will expire, taking their tokens with them. You still filter out any spam-bots that aren't sophisticated enough to support JavaScript or cookies, and you defeat the use of cached URL's/form submits because step 6 ensures that no token can ever be used more than once. Basically the spam-bot is forced to go through the entire cycle of requesting the page from your server, executing the JavaScript, and submitting the form for each submission that it wants to make.
精彩评论