Creating a PHP web app to allow users to vote on submissions - How can I minimize abuse
I've only written a few small php web apps and I'm throwing code together right now to allow for users to submit short开发者_运维百科 stories. these stories will display and allow others to vote them up. The winner receives something rather valuable and I'm paranoid people are going to try to manipulate it.
Debian / Apache / PHP 5.2 / jquery
users are not required to login / authenticate.
users can vote multiple stories up but only once for each story
Is it as simple as tagging each story with an IP address and not counting other submissions from that IP? Thanks for any advise.
No, it's not that simple. It's rather easy to spoof IP addresses and people can just vote from several places. Moreover, it's very likely that there are different people/computers connected using the same IP address. It's even (highly) possible that different people try to vote using the same computer (and browser).
Even a login system is not waterproof: you simply can't verify people's identities. Users can input different email addresses and fake data like names and birth dates.
On the other hand, if you want a unique identification of a given browser on a given computer (opposite to unique users), have a look at Panopticlick and read Browser Fingerprints – How Unique Is Your Browser – Panopticlick.
Apart from this, you can still (try to) use a cookie to identify a browser with which a vote has been cast.
You could put together a user profile database. So you draw certain user information, like IP, browser, location, OS version, etc. etc. And then create either a standard matching or weighted matching system that weeds out people if they are "too suspicious." Yeah, you'll still get false positives here and there, but if you're as paranoid as you say you are about legitimacy, you're probably best going a little overboard on validation. For even better results, always return a "successful" message so that they think they're gaming the system and don't try to find ways to beat it.
You should use a dual validation system, if you are decided not to implement a login/password interface. Nothing is perfect, but with common sense I believe you can set up something good.
Start with cookies. Just save some cookies with data like, story_id, datetime and a defined expiration time. That way, if that machine already voted, it wont be able to vote again for a period of time, lets say 24 hours.
Have a DB table with a list of all IPs that voted for the different stories. So, when someone is voting, just check if that IP is in the list for that particular story. That table should also have an additional column to save how many tries were performed from that IP. If the IP is in the list, and the cookie IS NOT SET, then someone already voted using that IP, the possible scenarios for this are:
- Maybe they are in a house and they are all voting.
- Maybe they are in an office, and they are all voting from different machines.
- Maybe they are in a public network, like a café or an airport.
In any of those normal case scenarios, it is very rare that an IP will try to vote more than 20 times for the same story. You should set a threshold of "home many times the same IP can vote". So, if there is no cookie, and the amount of times an IP has voted is above your threshold, then you know something is wrong and you give that IP a longer waiting time or maybe even block it completely from the voting process.
Use a captcha. Don't make it too easy to vote, but not to hard either. Even with the previously mentioned ways of validate your system, you can also put a captcha to force the user think/write something up before clicking "submit". You can use something as annoying as re-Captcha (http://recaptcha.net/) or a friendly looking one like m-captcha (http://code.google.com/p/m-captcha)/. This way, a hacker would have a painful time trying to override your security measures.
Uniquely identifying voters
- Have a strong captcha (like recaptcha) that prevents automated attempts. This blocks spoofed IPs and bots.
- Store an identifier in a cookie. This helps catch people on same computer and browser with different IP.
- Store the IP on the server. This helps catch people with multiple computers or browsers on the same home network. Unfortunately, this may also block everyone but Bob from voting at the big GM plant down the street. Tell them to vote from home.
Don't worry about user agent if you're verifying by IP from above. It's redundant.
If you follow all three guidelines, you might as well update the votes in real time because you cannot disqualify any vote that passes the above criteria. You probably shouldn't even show the vote button if they can't vote, unless you're going to let people change their votes, which is also a good idea if there will be late entries.
If I were voting, I'd want to be able to vote for a story immediately after I read it, then if I found a story I liked better, I'd want to be able to change my vote.
Voting system
Now, regarding the voting, you might want to require votes on at least three different short stories from the same "person" before you count any votes. You'll have a better chance of having the most popular story end up as the winner that way.
For example, if I send the link to all my friends to vote for my story, they'll also have to pick some other stories, probably the better ones. There is the chance that the worst movie ends up winning as people might vote for the worst two and their own to get the vote, however, in practice, I've never seen this happen. This is based on 7 years of film festivals that all use this system.
Also, for an Internet contest, it's absolutely crucial that everyone gets equal time at the top of the page, so you should figure out some way to rotate, or at least randomize, the entries.
Always put something in the contest rules about the judges getting the final word.
No, IP address is not a good identifier. Someone browsing from e.g. a mobile phone may never have the same IP address (and even home users can change their IP address regularly).
On the other hand, multiple people browsing from the same house/office/proxy server would be blocked by your method
with php you can get the IP address and the User Agent of their browser like so
IP: $_SERVER['REMOTE_ADDR']
User Agent: $_SERVER['HTTP_USER_AGENT']
sadly its not the perfect way but it does minimize abuse.
You could strengthen this by using a login system and storing what the person has voted on using cookies. but even then its still not perfect.
The best thing to do would be try to store the information in 2 places:
- Store the user's IP address along with their vote and the story in a database table, and doing a check every time they try voting to see if they already have or not
- Put a cookie on the user's browser to try and keep track of what they're voted on. Assuming you don't want to store a ton of data in it, you could simply assign an ID to the cookie and match that up in the aforementioned table, and do a check that way.
Combining the two methods together should allow you to know who's voted and for what, not letting somebody vote a second time. Of course, if they go to multiple machines, delete the cookie, and/or change their IP, there's nothing you can do to stop that.
Here's what I've done in the past:
- Check for cookie
- Check IP address within time window
- If OK, set cookie and update IP address and time (increase time window for that IP?)
- Always try to give the impression that users vote got counted
It's impossible to catch all scammers, even more so if it's anonymous voting.
精彩评论