stereochrome

Salted hashes

A salted hash is hash of a piece of information that has been seasoned with a piece of known random information, the salt. The salt is stored with the salted hash so that when you want to check if the hash matches a piece of information you have, you can concatenate the salt and data, hash it, and compare it to you existing hash.

Storing passwords as salted hashes

Most applications need to store passwords of some form or another. However, actually keeping a hold of those passwords in plain-text form isn’t a good idea because if your system is compromised and a cracker manages to get their hands on your accounts table, given that people tend to use a small number of passwords and will tend to use the same password for multiple different services, your users’ accounts on all those other services have been compromised.

You could try obscuring the passwords by using a hash function such as MD5 or SHA1, but you’re still left with a problem: password distributions tend to obey a power law distribution (which generally looks like an L-shaped curve), meaning there’s a small number of passwords or families of passwords that the vast majority of your users will choose. Given this, it’s relatively trivial to mount a dictionary attack to figure out what your users’ passwords are: once one person’s password has been worked out from the hash, any other users that have the same hashes have the same password.

To guard against this, you can include a cryptographic salt in your hash. By salting the password hash, you manage to obscure identical passwords, which greatly mitigates against dictionary attacks and attacks using rainbow tables.

To demonstrate its use, say you’re using the following schema for your accounts table:

CREATE TABLE accounts (
    uname  CHAR(24) NOT NULL PRIMARY KEY,
    pwd    CHAR(40) NOT NULL
);

The pwd field consists of a MD5 hash of the actual password and the salt, which makes up the first 32 characters, and the clear-text salt, which is the last eight. I’m not going to cover how you should generate the salt, but I will say that you should make every and all effort to ensure that each salt is unique and cryptographically random.

Here’s a MySQL stored function that will generate the salted password field given a password and a salt to use:

CREATE FUNCTION salted_password (pwd CHAR, salt CHAR)
RETURNS CHAR(40) DETERMINISTIC
RETURN CONCAT(MD5(CONCAT(pwd, salt)), salt);

Here’s a MySQL stored function that, given a salted password, will check if it matches the given actual password:

CREATE FUNCTION is_valid_password (salted CHAR, pwd CHAR)
RETURNS TINYINT DETERMINISTIC
RETURN SUBSTR(salted, 1, 32) = MD5(CONCAT(pwd, SUBSTR(salted, 33)));

As you can see, it partitions the salted hash in two parts, the first 32 characters being the actual salted hash, and everything from the 33rd character onwards being the salt. It then attempts to reconstruct what the salted hash would be given the salt we got from the salted hash and the password the user provided. If the two match, we’re good.

You’d use this to check against your accounts table as follows:

SELECT COUNT(*)
FROM   accounts
WHERE  uname = 'joebloggs'
  AND  is_valid_password(pwd, 'password');

If you can’t or don’t want to use stored functions, here’s what you’d do:

SELECT COUNT(*)
FROM   accounts
WHERE  uname = 'joebloggs'
  AND  SUBSTR(pwd, 1, 32) = MD5(CONCAT('password', SUBSTR(pwd, 33)));

Generating salts

A simple, though far from perfect, method for creating salts is:

function generate_salt() {
    return md5(uniqid(mt_rand(), true));
}

This yields a 64-character long password value when the salt and salted hash are combined. In reality, while this is plenty of data for a good salt, there are likely superior methods for salt generation. The use of MD5 helps obscure the method used to generate the salt, and the combination of uniqid() and mt_rand() uses two PRNGs1 along with a counter (the current time in microseconds) ought to produce pretty good salts, but a cryptographer would be able to come up with something far superior.

I generally keep my authentication details, i.e., the username and password, separate from the account details to make sure each row is of a constant size, which keeps things fast.


  1. A linear conguential generator and the mersenne twister respectively, though neither is a cryptographically secure pseudo-random numbers, which must be borne in mind. 

Created at 11:10 UTC on July 31st, 2007