Generating unique tokens that can't be guessed

2023-02-06 20:09 问答作者：

I have a system that needs to schedule some stuff and return identifiers to the scheduled tasks to some foreign objects. The user would basically do this:

identifier = MyLib.Schedule(something)
# Nah, let's unschedule it.
MyLib.Unschedule(identifier)

I use this kind of pattern a lot in internal code, and I always use plain integers as the identifier. But if the identifiers are used by untrusted code, a malicious user could break the entire system by doing a single Unschedule(randint()).

I need the users of the code to be able to only unschedule identifiers they have actually scheduled.

The only solution I can think of is to generate i.e 64-bit random numbers as identifiers, and keep track of which identifiers are currently handed out to avoid the ridiculously unlikely duplicates. Or 128-bit? When can I say "this is random enough, no duplicates could possibly occur", if ever?

Or better yet, is there a more sensible way to do this? Is there a way to generate identifier tokens that the generator c开发者_运维问答an easily keep track of (avoiding duplicates) but is indistinguishable from random numbers to the recipient?

EDIT - Solution based on the accepted answer:

from Crypto.Cipher import AES
import struct, os, itertools

class AES_UniqueIdentifier(object):
    def __init__(self):
        self.salt = os.urandom(8)
        self.count = itertools.count(0)
        self.cipher = AES.new(os.urandom(16), AES.MODE_ECB)
    def Generate(self):
        return self.cipher.encrypt(self.salt + 
                                   struct.pack("Q", next(self.count)))
    def Verify(self, identifier):
        "Return true if identifier was generated by this object."
        return self.cipher.decrypt(identifier)[0:8] == self.salt

Depending on how many active IDs you have, 64 bits can be too little. By the birthday paradox, you'd end up with essentially the level of protection you might expect from 32 bit identifiers.

Besides, probably the best way to create these is to use some salted hash function, such as SHA-1 or MD5 or whatever your framework already has, with a randomly chosen salt (kept secret), and those generate at least 128 bits anyway, exactly for the reason mentioned above. If you use something that creates longer hash values, I don't really see any reason to truncate them.

To create identifiers you can check without storing them, take something easy to detect, such as having the same 64 bit patterns twice (giving a total of 128 bits) and encrypt that with some constant secret key, using AES or some other cipher with a block size of 128 bits (or whatever you picked). If and when the user sends some alleged key, decrypt and check for your easy-to-spot pattern.

It sounds to me like you might be over thinking this problem. This sounds 100% like an application for a GUID/UUID. Python even has a built in way to generate them. The whole point of GUID/UUIDs is that the odds of collision are astronomical, and by using a string instead of an encrypted token you can skip the decrypting operation in the verify step. I think this would also eliminate a whole slew of problems you might encounter regarding key management, and increase the speed of the whole process.

EDIT:

With a UUID, your verify method would just be a comparison between the given UUID and the stored one. Since the odds of a collision between two UUIDs is incredibly low, you shouldn't have to worry about false positives. In your example, it appears that the same object is doing both encryption and decryption, without a third party reading the stored data. If this is the case, you aren't gaining anything by passing around encrypted data except that the bits your passing around aren't easy to guess. I think a UUID would give you the same benefits, without the overhead of the encryption operations.

You make your identifier long enough, so it can't be reasonable guessed. In addition, let Unschedule wait for 1 second, if the token is not in use, so a brute force attack is not feasible anymore. Like the other answer said, session IDs in Webapplications are exactly the same problem, and I already saw session IDs which where 64 random characters long.

This is the same problem as dealing with session identifiers in ordinary web applications. Predictable session ids can easily lead to session hijacking.

Have a look at how session ids are generated. Here the content of a typical PHPSESSID cookie:

bf597801be237aa8531058dab94a08a9

If you want to be dead sure no brute-force attack is feasible, do the calculations backward: How many attempts can a cracker do per second? How many different unique id's are used at a random point in time? How many id's are there in total? How long would it take for the cracker to cover, say 1 % of the total space of ids? Adjust number of bits accordingly.

Do you need this pattern in a distributed or local environment?

If you're local, most OO languages should support the notion of object identity, so if you create an opaque handle - just create a new object.

handle = new Object(); // in Java

No other client can fake this.

If you need to use this in distributes environments, you may keep a pool of handles per session, so that a foreign session can never use a stolen handle.

继续阅读：cryptography encryption language-agnostic uniqueidentifier

Generating unique tokens that can't be guessed

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Best solution for private video database [closed]

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML