Generating unique tokens that can't be guessed
I have a system that needs to schedule some stuff and return identifiers to the scheduled tasks to some foreign objects. The user would basically do this:
identifier = MyLib.Schedule(something)
# Nah, let's unschedule it.
MyLib.Unschedule(identifier)
I use this kind of pattern a lot in internal code, and I always use plain integers as the identifier. But if the identifiers are used by untrusted code, a malicious user could break the entire system by doing a single Unschedule(randint())
.
I need the users of the code to be able to only unschedule identifiers they have actually scheduled.
The only solution I can think of is to generate i.e 64-bit random numbers as identifiers, and keep track of which identifiers are currently handed out to avoid the ridiculously unlikely duplicates. Or 128-bit? When can I say "this is random enough, no duplicates could possibly occur", if ever?
Or better yet, is there a more sensible way to do this? Is there a way to generate identifier tokens that the generator c开发者_运维问答an easily keep track of (avoiding duplicates) but is indistinguishable from random numbers to the recipient?
EDIT - Solution based on the accepted answer:
from Crypto.Cipher import AES
import struct, os, itertools
class AES_UniqueIdentifier(object):
def __init__(self):
self.salt = os.urandom(8)
self.count = itertools.count(0)
self.cipher = AES.new(os.urandom(16), AES.MODE_ECB)
def Generate(self):
return self.cipher.encrypt(self.salt +
struct.pack("Q", next(self.count)))
def Verify(self, identifier):
"Return true if identifier was generated by this object."
return self.cipher.decrypt(identifier)[0:8] == self.salt
Depending on how many active IDs you have, 64 bits can be too little. By the birthday paradox, you'd end up with essentially the level of protection you might expect from 32 bit identifiers.
Besides, probably the best way to create these is to use some salted hash function, such as SHA-1 or MD5 or whatever your framework already has, with a randomly chosen salt (kept secret), and those generate at least 128 bits anyway, exactly for the reason mentioned above. If you use something that creates longer hash values, I don't really see any reason to truncate them.
To create identifiers you can check without storing them, take something easy to detect, such as having the same 64 bit patterns twice (giving a total of 128 bits) and encrypt that with some constant secret key, using AES or some other cipher with a block size of 128 bits (or whatever you picked). If and when the user sends some alleged key, decrypt and check for your easy-to-spot pattern.
It sounds to me like you might be over thinking this problem. This sounds 100% like an application for a GUID/UUID. Python even has a built in way to generate them. The whole point of GUID/UUIDs is that the odds of collision are astronomical, and by using a string instead of an encrypted token you can skip the decrypting operation in the verify step. I think this would also eliminate a whole slew of problems you might encounter regarding key management, and increase the speed of the whole process.
EDIT:
With a UUID, your verify method would just be a comparison between the given UUID and the stored one. Since the odds of a collision between two UUIDs is incredibly low, you shouldn't have to worry about false positives. In your example, it appears that the same object is doing both encryption and decryption, without a third party reading the stored data. If this is the case, you aren't gaining anything by passing around encrypted data except that the bits your passing around aren't easy to guess. I think a UUID would give you the same benefits, without the overhead of the encryption operations.
You make your identifier long enough, so it can't be reasonable guessed. In addition, let Unschedule wait for 1 second, if the token is not in use, so a brute force attack is not feasible anymore. Like the other answer said, session IDs in Webapplications are exactly the same problem, and I already saw session IDs which where 64 random characters long.
This is the same problem as dealing with session identifiers in ordinary web applications. Predictable session ids can easily lead to session hijacking.
Have a look at how session ids are generated. Here the content of a typical PHPSESSID cookie:
bf597801be237aa8531058dab94a08a9
If you want to be dead sure no brute-force attack is feasible, do the calculations backward: How many attempts can a cracker do per second? How many different unique id's are used at a random point in time? How many id's are there in total? How long would it take for the cracker to cover, say 1 % of the total space of ids? Adjust number of bits accordingly.
Do you need this pattern in a distributed or local environment?
If you're local, most OO languages should support the notion of object identity, so if you create an opaque handle - just create a new object.
handle = new Object(); // in Java
No other client can fake this.
If you need to use this in distributes environments, you may keep a pool of handles per session, so that a foreign session can never use a stolen handle.
精彩评论