What Your Oath Library Isn't Telling You About Passwords: Hashing and Salting Explained

Before I started authoring my projects, I didn’t think too deeply about what was going on with passwords behind the scenes.

Like most developers, I installed a library called a hash function, stored the result, and moved on. I see a random string like $2a11yMMbLgN9uY6J3LhorfU9iu.... in my database and assume my user passwords are uncrackable. I knew it was a hashed password. But what was $2a? what was 11? And if I can’t reverse it, how was my app authenticating the login?

If you’ve ever used bcrypt, Devise, Django’s auth system, or indeed any authentication library, you’ve been protected from these details. This is good engineering. But understanding what’s actually going on makes you a better developer, and it explains a lot of things that seem vague or arbitrary until they suddenly become so.

By the end of this article, you will be able to look at this string and know what each part means.

Conditions

This article is written for developers who have used an authoring library before but never looked closely at what it does. You don’t need a cryptography background. If you’ve ever hashed a password and moved on, this is for you.

Hashing vs Encryption
Why a simple hash is not enough
Enter the salting.
Why bcrypt is slow (and why this is the point)
What is actually in your database?
wrap up

Hashing vs Encryption

Most developers use the terms Hashing And Encryption with each other. They are not the same, and the difference matters more than you might think.

Encryption is a two-way process. You take data, encrypt it with a key, and you can later decrypt it using the same key (or a related one). This is useful when you need to retrieve the original value. Storing a credit card number you’ll need to charge later, or sending a message that the recipient must read.

Hashing is different. This is a one-way process. You put in data, you get a fixed-length string, and there’s no key that lets you reverse it. The original value is gone.

This may seem like a limitation. For passwords, this is actually what you want.

Think about it: When a user logs in, you don’t need to know their password. You just need to verify that what they typed matches what they set when they signed up. You can do it entirely with hash. Hash what they typed, compare it to the stored hash, done. You never need the original.

This is why “forgot password” flows always ask you to set a new password instead of sending your old password. Yes, sending you your old password over email can be dangerous but the main reason is that they can’t actually recover it. If they can email you your original password, that’s a red flag. That means they stored it in a way that’s reversible, which means it’s not properly protected.

Why a simple hash is not enough

So if hashing is one-way and irreversible, isn’t that enough? Hash each password before storing it and you’re done?

Absolutely not.

The first problem is this. Rainbow tables. Oh Rainbow table There is a precompiled database of hashes for common passwords. An attacker who intercepts your database does not need to reverse the hashes. They only see them. If your user’s password is “password123”, his/her SHA-256 The hash is always a single string, and that string almost certainly already exists somewhere in the rainbow table.

The second issue is related. If two users have the same password, they will have the same hash. So if an attacker breaks one, he’s broken them all. In a database with thousands of users, this is a significant security risk.

Here’s what it looks like in practice:

import hashlib

# Two users, same password
password = "password123"

hash_one = hashlib.sha256(password.encode()).hexdigest()
hash_two = hashlib.sha256(password.encode()).hexdigest()

print(hash_one == hash_two)  # True, every single time

The hash is deterministic. The same input always produces the same output. This is useful for many things, but for passwords it poses a real risk.

A simple hash separates you from there. But that alone is not enough.

Enter the salting.

The solution to both problems is a name. salt. And, no it’s not your regular table salt.

A salt is a random string that is uniquely generated for each password. Before hashing, you concatenate the salt with the password, then hash the result.

import hashlib
import os

password = "password123"

# Generate a random salt
salt = os.urandom(16).hex()

# Combine salt and password, then hash
salted_password = salt + password
hashed = hashlib.sha256(salted_password.encode()).hexdigest()

print(f"Salt: {salt}")
print(f"Hash: {hashed}")

Now two users with the same password generate completely different hashes, because their salts are different. And because salt is random and unique, it cannot be counted in advance in the rainbow table.

Here’s the amazing part: Salt doesn’t have to be secret.. This gets stored in your database in plain text with a hash. It may feel wrong at first. If an attacker has your database, they also have the salt.

But that’s okay. Salt’s job is not to remain secret. Its function is to make each hash unique so that previously generated tables are useless. An attacker who wants to crack a salted hash has to individually brute force each password from scratch using that particular salt. They cannot reuse the work across users.

This is a meaningful increase in attack cost, even when salt is visible.

Why bcrypt is slow (and why this is the point)

Solves the salty rainbow table problem. But there is still a gap. If an attacker has your database and decides to brute force the password, they can only guess. Hash the candidate password with the stored salt, compare it to the stored hash, repeat. With fast hashing algorithms like SHA-256, a modern GPU can perform billions of these comparisons per second.

This is the problem with using a general-purpose hash function for passwords. Algorithms like SHA-256 and MD5 were designed to be fast. This is great for things like verifying file integrity or generating checksums. For passwords, this is a liability.

This is where bcrypt comes in. bcrypt is a password hashing algorithm specifically designed to be slow. Not accidentally broken or inefficient, but intentionally, slow to set up. It has one. The cost factor (sometimes called the work factor) which controls how computationally expensive the hashing operation is.

import bcrypt

password = b"password123"

# The cost factor is set here (12 is a common production value)
hashed = bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))

print(hashed)

Whenever you increase the cost factor to 1, the hashing operation takes about twice as long. At a cost factor of 12, a single hash can take about 300 milliseconds on your server. It is unintelligible to the logged in user. But for an attacker trying to brute force millions of passwords, this turns a viable attack into an unfeasible one.

Another advantage of a configurable cost factor is that you can increase it over time as the hardware gets faster. What was quite slow in 2015 may not be so slow today. bcrypt lets you adapt the algorithm without changing it.

What is actually in your database?

So far, we have discussed saltiness and cost factors as separate concepts. Here’s the satisfying part: in bcrypt, they’re all bundled into one string. This string in your database contains everything you need to verify a password, and it’s not mysterious at all once you know how to read it.

Here is a typical bcrypt hash:

\(2a\)12$yMMbLgN9uY6J3LhorfU9iuLAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO

Let’s break it down:

$2a – of Algorithm version. This tells your auth library which version of bcrypt was used to generate the hash.
$12 – of The cost factor. This is the number we talked about in the previous section. A cost factor of 12 means that the hashing operation is run 2¹² times.
$yMMbLgN9uY6J3LhorfU9iu – of salt. The first 22 characters after the final $ are salts, which are stored in plaintext with hashes. Your auth library reads it again when authenticating a login.
LAUwKxyy8w42ubeL4MWy7Fh8B.CH/yO – of Hash The remaining characters themselves are the actual output of the hashing operation.

When a user logs in, your authentication library does not require any additional information. This algorithm reads the version, cost factor, and salt directly from the stored string, hashes the login attempt using the same parameters, and compares the result. If they match, the password is correct.

This is why bcrypt authentication works even though the salt is never stored separately. It was never separate to begin with.

wrap up

The next time you see a bcrypt string in your database, you’ll know exactly what you’re looking at. The algorithm version, cost factor, salt, and hash are all encoded into a single string that your authoring library knows how to read.

But the biggest benefit is this: the libraries we rely on every day aren’t magic. They are carefully designed systems built on concepts that are understandable.

Knowing why bcrypt is slow, why salting works even when the salt is visible, and why fast hash functions like SHA-256 are the wrong tool for passwords makes you a more informed developer. You’ll make better decisions about cost factors, you’ll recognize a poorly implemented encryption system when you see it, and you’ll understand why data breaches where passwords were hashed with MD5 are much worse than where bcrypt was used.

Conditions

Table of Contents

Hashing vs Encryption

Why a simple hash is not enough

Enter the salting.

Why bcrypt is slow (and why this is the point)

What is actually in your database?

wrap up

Editor's pick

Get latest news

What Your Oath Library Isn’t Telling You About Passwords: Hashing and Salting Explained

Conditions

Table of Contents

Hashing vs Encryption

Why a simple hash is not enough

Enter the salting.

Why bcrypt is slow (and why this is the point)

What is actually in your database?

wrap up

The defense official revealed how AI chatbots can be used to target decisions.

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news