How long can a hash left out in the open be considered safe?
If I were to leave a SHA2 family hash out on my website - how long would it be considered safe? How long would I have before I could be sure that someone would find a collision for it and know what was hashed?
I know that the amount of time would be based on the computational power of the one seeking to break it. It would also depend on the string length, but I'm curious just how secure hashes are.
Since many of us run web-servers we constantly have to be prepared for the day when someone might make it all the way to the database which stores the user hashes. So, move the server security out of the way and then what do you have?
This is a slightly theoretical area for many of the people I have talk开发者_JS百科ed with, so I would love to actually have some more information about average expectations for cracking.
hash('sha256', 'mytext');
hash('sha256', 'thisismytext');
hash('sha256', 'xx$1sw@the4e');
hash('sha256', 'thisismyslightlylongertext');
db695168e73ae294e9c4ea90ff593e211aa1b5693f49303a426148433400d23f
b62c6ac579abf8a29e71d98aeba1447c66c69002cfd847b148584f886fd297ef
501f1b26abbc75594d06f0935c8bc502d7bcccf5015227bd6ac95041770acb24
3debc12761bbeb5b4460978ac2be5b104163de02ea799f0705399d0e5b706334
First, you are not talking about a collision. A collision is when someone finds two distinct messages which hash to the same value. Here you are not fearing someone finding another input which hash to the value you publish; indeed, you fear someone finding your input. The correct term is preimage attack. Sometimes, we say that the attacker is trying to "invert" the hash function (find an input matching a given output).
There are two ways to try to find a preimage for a given hash value: exploit a weakness of the hash function, or guess the input by trying out candidates.
There is no known weakness of SHA-2 with regards to preimage resistance. Come to that, there is no such known weakness for MD5 or even MD4, although those two functions are considered to be, cryptographically speaking, thoroughly broken. Therefore, barring stupendous advances in scientific research on hash function, chances are that your hash value will not be found through a hash function cryptographic weakness.
Trying candidates may be possible, or not, depending on what the attacker knows of the input. This is quite hard to accurately model. Suppose, for instance, that the input is a single word containing seven letters. There are 267 = 8031810176 such words. Trying out all of them with SHA-256, comparing each time with your hash value, takes a few minutes on a recent PC, with a naive implementation.
On a more general basis, exploring the set of possible inputs is called a dictionary attack because it is often applied on the problem of recovering a user password: users are depressingly unimaginative and often choose passwords from a limited set of, well, "words", and it seems logical to call "dictionary" that set of words. We also call it "brute force" or "exhaustive search".
Assuming that the dictionary is sufficiently small for an attacker to realistically try out all its words, then not only your hash value will be eventually inverted (if there is enough incentive for the attacker), but this also opens the way for cost sharing: the attacker may try to share his computing efforts across several similar attack situations (i.e. several hash values to invert, with the same hash function -- there again, a common password-related attack model). A basic cost sharing method is to make a precomputed table: the attacker computes all the hashes for his dictionary once; then, all subsequent hash values can be attacked by simply looking up the hash value in the table. A lookup is very fast (the attacker sorts his hashes in increasing order). Rainbow tables are a kind of precomputed table, in a smart way which allows for a compact representation: they make it possible for the attacker to "keep" a big precomputed table without needing a truckload of hard disks. Still, rainbow or not, all the values in the table (the one before compression in the case of a rainbow table) must be computed at least once by an attacker somewhere, i.e. that someone is able to do a full dictionary attack. This has two costs: the CPU cost (for computing all the hashes) and the storage cost (for storing the hash values). A rainbow table makes the storage cheaper, but does not improve things with regards to CPU.
Salting defeats precomputed tables (including rainbow tables). It makes small dictionaries more tolerable. That is, if we assume that inverting one hash value is doable, then the salt makes sure that, at least, the attacker will have to pay the full CPU cost of a dictionary attack each time, and he won't be able to share his cost across several attacks or with other attackers. Salting is needed for passwords, because it turned out to be impossible, in all generality, to make generic users choose and remember passwords from a large enough set of possible passwords.
It still is much better if your input is from a dictionary large enough to defeat a single brute forcing effort. The important thing is the size of the set of values which your input string may take; that set must be estimated with regards to what the attacker knows of the attacked data. For instance, if the attacker is trying to find a user password, then he knows that the input string is short (users have little patience) and consists only in characters that can be input (blindly !) on a keyboard; and he also knows that the sequence can be memorized, which makes things like ".%f*(.ds/~\d09j@" quite improbable. There is no limit per se on the input size; we say that rainbow tables are limited to "15 characters or so" because users who accept to type more than 15 characters will also choose passwords from too large a set to allow the single brute force effort needed for table construction. Note that trying all sequences of 15 characters is already way too much (even all sequences of 15 lowercase letters imply more than 270 hash computations, and that's not really feasible with today's technology).
Thomas's answer is already very detailed, but I would add this criteria:
What is the benefit of breaking the hash?
Drop a penny on the street. How long does it take until someone picks it up?
Now drop a $20 bill and do the same experiment.
If the value of what you are trying to secure is low, it is possible that nobody will try to break the hash at all.
If the value and benefit of breaking the hash is high, it will only survive as long as it takes to buy the necessary computing power from the Amazon cloud. They now even sell GPUs.
You're assuming no rainbow table targeting your setup would be available, which is not a given. IMNSHO, consider it broken the moment it's leaked. Even with bcrypt you cannot be sure about how much work was already done before your hash became public.
精彩评论