Can per-user randomized salts be replaced with iterative hashing?
In the process of building what I'd like to hope is a properly-architected authentication mechanism, I've come across a lot of materials that specify that:
- user passwords must be salted
- the salt used should be sufficiently random and generated per-user
- ...therefore, the salt must be stored with the user record in order to support verification of the user password
I wholeheartedly agree with the first and second points, but it seems like there's an easy workaround for the latter开发者_开发百科. Instead of doing the equivalent of (pseudocode here):
salt = random();
hashedPassword = hash(salt . password);
storeUserRecord(username, hashedPassword, salt);
Why not use the hash of the username as the salt? This yields a domain of salts that is well-distributed, (roughly) random, and each individual salt is as complex as your salt function provides for. Even better, you don't have to store the salt in the database -- just regenerate it at authentication-time. More pseudocode:
salt = hash(username);
hashedPassword = hash(salt . password);
storeUserRecord(username, hashedPassword);
(Of course, hash
in the examples above should be something reasonable, like SHA-512, or some other strong hash.)
This seems reasonable to me given what (little) I know of crypto, but the fact that it's a simplification over widely-recommended practice makes me wonder whether there's some obvious reason I've gone astray that I'm not aware of.
EDIT Some appear to not grok what the question is. I no way am I suggesting that no salt be used. Referring to TheRook's edited answer: I'm familiar with the references noted in those CWE's. The core question I have is: why is hash(username) a predictable salt?
EDIT 2 Thanks to all those that provided answers; biffabacon directly addressed my core question in his 2nd paragraph (basically, anything you can do to maximize the domain of the salts being used and therefore the hashed passwords being generated is good), but there's lots of tasty info in various comments on this question.
The reason for salts was to prevent cryptanalysis attacks. Unique salt per user means you can't tell if two users have the same password. Nondeterministic salt per user means you can't tell if the same username:password is used on two systems.
Don't try to outclever the salt. If you begrudge the space, then don't use them, and put your effort into protection your data (and the backups!) directly.
The salt helps protect against an attacker using a precomputation or dictionary attack. When a salt is used, the attacker needs to create a separate dictionary for every salt value. However, if the salt isn't random you give the attacker an advantage, because they can create dictionaries that are more likely than others. For example, they could create a dictionary using a salt of jsmith (or hash of jsmith). For this reason, it is generally a good idea for the salt to be random.
Comparing a precomputation attack against a hash(username) salt and a random salt: let's say for example the attacker decides to create dictionaries for the most common 1000 usernames and, say, half a dozen different hash algs; that's 6000 salts and so 6000 dictionaries. If a random 32 bit salt is used that's 2^32 or circa 4.2 billion dictionaries. So when the salt space is dramatically reduced (like it is by using hash(username)) precomputed attacks become much more feasible
The point of the salt is to prevent the attacker from performing parallel attacks. That parallelism must be understood both space- and time-wise; roughly speaking, this means sharing the attack cost between two or more attacks.
For instance, consider a non-salted hashed password setting. The attacker can hash all words in a dictionary for a cost proportional to the size of the dictionary, and check those hashed words with regards to several hashed passwords. This can be simultaneous (the attacker has a list of hashed passwords and wants to crack one) or iterative (the attacker precomputes his hashed dictionary, then uses it as a tool against several passwords in distinct systems). Either way, this is cost sharing.
The salt is some data which should be somewhat unique to each hashed password instance. Salting prevents such cost sharing, to the extent of the uniqueness of the salt.
Using the user name (or hash thereof) as a salt leverages user name uniqueness: usually, on a given system at a given time, user names are unique. This prevents locally space-wise sharing: if the attacker gets a snapshot of all hashed passwords, he cannot attack them in parallel with cost sharing; he will have to incur the hashed dictionary cost for every attacked password. However, this does not prevent time-wise sharing (the attacker precomputes a hashed dictionary with the salt corresponding to user "bob" and will regularly try to guess Bob's password, assuming that Bob changes his password on a regular basis, e.g. because this is mandated by his system administrator). This does not prevent either some global sharing (there are several -- many -- systems out there, with a user going under the name of "bob").
So using the user name as salt is not bad; this is better than using no salt at all. But a random salt is still better, because it will change even in situations where the the user name is kept unchanged (a user changing his password; two users on distinct systems with the same name).
If you use a random salt, you're pulling from a very large, random pool of possibilities. When you're pulling from usernames, you're pulling from nowhere near as large of a pool, as usernames are usually all lowercase, dictionary-word'ish, and are a subject of constraints that the OS/authentication system puts on the usernames (must start with a letter, no special characters, some OS's still require up to 8 char usernames). Lots of usernames are standardized, or at least popularized: root, administrator, bob, mary... you get the idea. Another problem is that usernames aren't usually protected, you can see them through apache's user directories, anonymous ftp often allows the public directory to group things up by username, etc. An attacker could just start by harvesting the usernames, and building themselves a very nice list of salts.
All this stuff adds up to one problem: higher probability of coming up with a list of salts that works.
This gives attackers an ability to do offline pre-calculation of usernames and their possible hashes, setting it up for a bruteforce attack. You might want to create a challenge-response mechanism to thwart that.
There are 2 recognized vulnerabilities relating to the use of salts. The first is CWE-759, which states that a salt must be used for passwords. The 2nd vulenrablity is much more important, it is CWE-760: Use of a One-Way Hash with a Predictable Salt . The salting mechanism that you are purposing is a vulnerability according to CWE-760,
A salt should be generated using a large Cryptographically secure pseudo-random number. This number should be base256. A good size would be the same number of bits that the message digest produces. For instance SHA256 should have a 256bit salt. Both should have the same amount of entropy because they are both susceptible to brute force.
In order to break a salt of this size you'll need a rainbow table so large we don't even have a word for it.
Why not use the hash of the username as the salt?
A password hash cannot be broken until the salt is retrieved. Salts make precomputed attacks more resource intensive, but never impossible. The problem with using a hash of the user name is 2 fold. First of all you are computing 2 message digests which is a waste of resources. From a security perspective this salting mechanism is unsuitable becuase the username for web applications is often public knowledge. A salt must be a secret, ideally this secret is stored separately from the password hashes. If the salt is stored in the database along side the password hash, then SQL Injeciton can be used to obtain both values and then a simple dictionary attack can be used to break the hash.
To improve the secuirty provided by a salt it should be stored separately from the password hashes such that both must be compromised before the any hash can be broken. This can be accomplished by storing the salts in a separate database, or in a local flat file. Keep in mind that mysql's file_priv's could be use do to read this flat file, so make sure this is disabled.
I'm going to take a slightly contrary view. A pure hash isn't the best idea for the reasons given (but better than nothing), but I think it's an entirely different story if you do a hash of an application-wide salt + username. It's still per-user but not something an attacker can easily guess. Obviously you'll want to make sure that the app-wide salt isn't visible to an attacker, e.g., by reading it from a file that's outside of the app server directory tree.
You can extend this a bit further with multiple hashes. That is, don't just use
H(password.username.appsalt)
use something like
h1 = H(password.appsalt1.username.password)
h2 = H(password.appsalt2.username.password)
h3 = H(password.appsalt3.username.password)
H(h1.H(h2.H(h3))))
(where H() is the hash and '.' is simple concatenation.) The extra time won't have a noticeable impact on your application but makes the cost to an attacker MUCH higher.
I think multiple hashes are a good idea even if you store random hashes in the user authentication table. Again it won't have a noticeable impact on your application but will make it much harder for an attacker.
精彩评论