Can a deterministic hashing function be easily decrypted? [duplicate]
Possible Duplicates:
Is it possible to decrypt md5 hashes? Is it possible to reverse a sha1?
i asked this question: working with HUGE spreadsheet
and got a great answer and i followed the advice. i used this: http://splinter.com.au/blog/?p=86
and i hashed about 300,000 different elements in a column in an excel spreadsheet
since you can do:
=SHA1HASH('The quick brown fox jumps over the lazy dog')
And you'd get back:
2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
couldnt you go backwards as well?
im saying if it encrypts the same text the same way every single time, what is the point?
if you do know the hash algorithm, is it possible to go backwards开发者_如何学C?
can you please explain to me very simply how does hashing work? how can you convert a 20gb to a 40 character hash? does it take a long time to hash a 20gb hardrive?
General answer
A cryptographic hash function cannot be easily reversed. This is why it is also sometimes called a one-way function. There is no going back.
You should also be careful about calling this 'decryption'. Hashing is not the same as encryption. The set of possible hash value is typically smaller than set of possible inputs so multiple inputs map to the same output.
For any hash function given the output you can't know which of the many inputs was used to generate this particular output.
For cryptographic hashes like SHA1 it is very difficult to even find one input that produces that output.
The simplest way to reverse a cryptographic hash is to guess the input and hash it to see if it gives the right output. If you are wrong, guess again. Another approach is to use rainbow tables.
Regarding using hashing to encrypt SSNs
With your use case of SSNs an attack is feasible due to the relatively small number of possible input values. If you are worried about people getting access to SSNs then it might be best to not store or use the SSN at all in your application, and in particular do not use them as an identifier. Instead you could find or create another identifier, for example an email address, a login name, a GUID or just an incrementing number. It can be tempting to use the SSN as it is already there and at first glance appears to be a unique unchanging identifier, but in practice using it just causes problems. If you absolutely need to store it for some reason then use strong non-deterministic encryption with a secret key and make sure you keep that key safe.
The whole point of a cryptographic hash is that you can't decrypt it and that it does encrypt the same way every time.
A very common use case for cryptographic hashes is password validation. Imagine I have the password "mypass123", and the hash is "aef8976ea17371bbcd". Then a program or website wishing to validate my password can store the hash "aef8976ea17371bbcd" in their database, instead of the password, and every time I want to log in, the site or program re-hashes my password and makes sure that the hashes match. This allows the site or program to avoid storing my actual password, and so protects my password (in case it's a password I use elsewhere) in the case that the data is stolen or otherwise compromised -- a hacker would not be able to go backwards from the hash to the password.
Another common use of cryptographic hashes is integrity checking. Suppose a given file (e.g. an image of a Linux distribution CD) has a known, publicly available cryptographic hash. If you have a file which purports to be the same thing, you can hash it yourself and see if the hashes match. Here, the fact that it hashes the same way every time allows you to independently validate it, and the fact that it is cryptographically secure means that no one can feasibly create a different, fake file (e.g. with a trojan in it) that has the same hash.
Keep in mind the very important distinction between hashing and encryption, though: hashing loses information. This is why you can't go backwards (decrypt) the hash. You can hash a 20 GiB file and end up with a 40-some character hash. Obviously, this has lost a lot of information in the process. How could you possibly "decrypt" 40-some characters into 20GiB? There's no such thing as compression that works that well! But this is also an advantage, because in order to check the integrity of a 20 GiB file, you only have to distribute a 40-some character hash.
Because information is lost, many files will have the same hash, but the key feature of a cryptographic hash (which is what you're talking about) is that despite the fact that information is lost, it is computationally infeasible to start with a file and construct a second, slightly different file that has the same hash. Any other file with the same hash would be radically different, and not easily mistakable for the original file.
I see your point based on the fact that you are trying to hide Social security numbers. If someone knows you are using an SHA1HASH on the SSN to create a unique identifier, then can just generate a quick list of all SSN numbers, SHA1HASH them, then compare to automatically have the SSN of the person in the record. Even worse, they can pregenerate all these in a hash lookup table, and have a key of 1 hash for every SSN. This is called a hash lookup table, and more complex forms are called rainbow tables.
This is why a second feature of hashing was invented. It is called salting. Salting is basically this; you create a salt, then modify your data using the salt. For instance, say you had the SSN 123-45-6789 . You could salt it with the string "MOONBEAM". Your new string for hashing is "123-45-6789MOONBEAM"
Now, even if someone knows that you are hashing the SSN to generate your unique ID, they still don't know the salt you will be using, and so are unable to derive the original SSN by pre-hashing a list of all SSNs and comparing to your ID. You however, can always take the user's SSN, use the salt, and rehash the SSN+SALT to see if the user SSN matches up with their ID.
Finally, if you use just 1 salt for everything, and keep it secret, instead of being able to see the salt, and generate the corresponding SSN by running SSN increments + salt 100 million times and picking the match, they have to do a lot more work to retrieve SSN. This is because the 100 million SSN numbers have a relatively low amount of entropy. (10^9 combinations). By adding your salt and keeping it secret, instead of just running
SHA1HASH(111-11-1111) -> check hash match
SHA1HASH(111-11-1112) -> check hash match
SHA1HASH(111-11-1113) -> check hash match
They would have to run
SHA1HASH(111-11-1111a) -> check hash match
SHA1HASH(111-11-1111b) -> check hash match
SHA1HASH(111-11-1111c) -> check hash match
...
SHA1HASH(111-11-1111azdfg) -> check hash match
SHA1HASH(111-11-1111azdfh) -> check hash match
....
SHA1HASH(111-11-1111zzzzzzzzzzzzzzzz) -> check hash match
SHA1HASH(111-11-1112a) -> check hash match
SHA1HASH(111-11-1112b) -> check hash match
.. and so on until they finally get to
SHA1HASH(123-45-6789MOONBEAM) -> check hash match
at which point they finally did manage to crack the SSN + SALT
They don't even know how many characters long your salt is So that is 10^(number of characters of your salt) times more work for them to do just to get 1 SSN, let alone get the whole table.
Update: Many years later, I see that my info on salting was incorrect when I answered this question. Please see the correct info in posts and comments below about using unique salts per entry, as this is still the first post in the chain. If you think I should change the OP after reading this, leave a comment below (or upvote one), and if the consensus is there, I will correct it.
No, you cannot go backwards because not enough information is preserved by the hashing function.
You can think of it as the hash function mapping the original text to a single, huge, number. This same number may also be mapped to other texts as well, although a good hash function will have few collisions:
If the original message were encrypted then yes, you could go back.
Encrypting and hashing are two different things.
Hashing simply digests the string into a number. Encryption preserves the contents of the string so that it can later be decrypted. There is no method from getting the original string from a hash. The contents are just not there.
No. The point of a hash is that it's one way encryption (as other's have pointed out, its not really "encryption", but stay with me here). The downside is the, in theory, there is a small possibilities of "collisions", when two or more string return the same hash. But it's usually worth this downside.
A good hash is one way, meaning you shouldn't be able to go backwards. The point is to provide a key of a string without revealing the string. For instance, this is a good way to match passwords without storing a password. Instead, you store a hash and compare the resultant hash of inputs.
No. At least not easily.
SHA1 is still considered cryptographically secure. A hash algorithm is secure if it is easy to compute one way, but very hard (exhaustive search) to compute the other way. It is true that every time you encrypt a specific phrase, it will result in the same hash, but there are infinite phrases that will also hash to that same value. The security comes from not knowing what those other phrases are until you run them all through the SHA1 function.
No, you cannot go back. Count how many different hashes you can have. Now count how many different strings you can have. The first is finite, the second is infinite. There are lots of (infinitely many, to be precise) strings which have the same SHA1 sum. The point is, however, it's very hard to find two texts, which have the same hash.
You can think of hashing as shortening something. For example take a hashing function which sums all the ASCII codes of the letters in a string. You can't tell what was before hashing, just knowing the sum of ASCII codes of the letters. It is similar with SHA1, but more complicated.
The point of hashing is not to encrypt something. The point of hashing is to shorten something, so that checking whether two things are the same takes less time. Now how can you tell that two things are indeed the same if you know that lots of things have the same hash? Well, you can't. You just assume that it's so rare that it won't happen.
But hashing is not just about checking, as checking equality using hashes is usually used just for confirmation/validation and it is not deterministic. If you see that hashes are the same, then basing on the parameters of a particular hashing function, you can estimate the probability that the hashed objects are indeed the same.
And that's why the fact that a hashing function always yields the same results for the same objects is the most important feature of a hashing function. It lets you validate and compare objects.
That it encrypts the same text the same way every time is the whole point of a hash. It's a feature.
If I have a database of hashes of passwords, then I can check that you entered the correct password by hashing it and seeing if the hash matches what I have in the database for you. But if somebody stole my database of hashes, they won't be able to figure out what your password is unless they accidentally stumble upon some plain text that hashes to that value.
In cryptography it is called a digest. A cryptographically strong digest doesn't allow to get source text based on the digest value without some additional knowledge. A digest value is the same for the same text, so you can calculate digest of the text and compare it with a published digest. A popular application is the password verification, so you can save digest instead of the password. This is of course prone to a dictionary attack which you already explored, and that is why it is strongly recommended to not use dictionary words for passwords.
精彩评论