What Is a Hash? MD5, SHA-256 and How Hashing Works
A hash is a fixed-length string of characters produced by running data through a mathematical function called a hash function. Feed in any amount of text — a single letter, an entire database — and you always get back the same-size output. That output, the hash, looks like random gibberish, but it is completely deterministic: the same input always produces the same hash, and even a tiny change to the input produces a completely different one.
The core idea: fingerprinting data
Think of a hash as a fingerprint for a piece of data. Just as two people can have the same name but different fingerprints, two different files almost certainly have different hashes. And just as you cannot reconstruct a person from a fingerprint, you cannot reconstruct the original data from its hash. That one-way property is what makes hashes useful for security.
Here is a concrete example using SHA-256, the most widely used hash algorithm today:
helloSHA-256:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824Input:
Hello (capital H)SHA-256:
185f8db32921bd46d35cc2e671f4b50f5993aa56b0b4136f26cdc26f6d71f4e5
The two inputs differ by one character — an uppercase versus lowercase H — but the two hashes share nothing recognizable in common. This dramatic sensitivity to small changes is called the avalanche effect, and it is intentional. It means you cannot guess or "work backward" to the original input by studying the hash.
What makes a hash function cryptographic
Not all hash functions are cryptographic. Simple checksums like CRC32 are designed to detect accidental data corruption — they are fast and lightweight, but an attacker can deliberately craft two different inputs that produce the same CRC. A cryptographic hash function is specifically engineered to prevent this:
- Pre-image resistance: Given a hash, it should be computationally infeasible to find any input that produces it. You cannot work backward.
- Second pre-image resistance: Given an input and its hash, you should not be able to find a different input that produces the same hash.
- Collision resistance: It should be practically impossible to find any two different inputs that produce the same hash output.
When researchers find practical attacks that violate any of these properties, the algorithm is considered "broken" for security purposes — which is exactly what happened with MD5 and SHA-1.
MD5, SHA-1, SHA-256: what changed across generations
Cryptographic hash functions have gone through several generations. Here is how the major ones compare:
| Algorithm | Output size | Year introduced | Security status |
|---|---|---|---|
MD5 | 128 bits (32 hex chars) | 1992 | Broken — collisions practical |
SHA-1 | 160 bits (40 hex chars) | 1995 | Broken — collisions demonstrated |
SHA-256 | 256 bits (64 hex chars) | 2001 | Secure — recommended |
SHA-384 | 384 bits (96 hex chars) | 2001 | Secure — high-assurance use |
SHA-512 | 512 bits (128 hex chars) | 2001 | Secure — high-assurance use |
MD5 was broken in practice by 2004, and SHA-1 collisions were demonstrated by Google's Project Zero in 2017. Neither should be used for new security-sensitive code. SHA-256 is the practical default for most applications today. SHA-384 and SHA-512 offer higher collision resistance for contexts where that extra margin matters, such as certificate authorities and long-lived digital signatures.
Despite being broken for security, MD5 and SHA-1 are still legitimate for non-security uses: checksums to detect accidental corruption, legacy interoperability, and internal deduplication where an adversary is not in the picture.
How a hash is actually computed
You do not need to understand the internals to use a hash function, but a high-level picture helps. SHA-256 belongs to a family called Merkle-Damgard construction. The algorithm:
- Pads the input to a multiple of 512 bits.
- Breaks it into 512-bit blocks.
- Feeds each block through 64 rounds of bitwise operations (AND, OR, XOR, rotations, additions) mixed with a set of constants derived from the square roots of prime numbers.
- The output of each block feeds into the next as the "state," so every bit of input influences every bit of output.
- The final 256-bit state is the hash.
The constants and mixing operations are designed so that reversing the process is computationally equivalent to brute-forcing all possible inputs — which for SHA-256 means trying up to 2256 combinations. That number is larger than the estimated number of atoms in the observable universe, which is why the function is considered secure.
Where hashing is used in real software
Hashes show up in more places than most developers realize:
- File integrity verification. When you download software, the vendor often publishes a SHA-256 checksum alongside it. After downloading, you hash the file yourself and compare. If the hashes match, the file arrived intact and unmodified. If they differ, something went wrong — either corruption in transit or, in a worst case, tampering.
- Password storage. Websites do not (or should not) store your actual password. They store a hash of it. When you log in, the server hashes what you typed and compares it to the stored hash. This means a database breach does not immediately expose everyone's passwords — an attacker still has to crack each hash.
- Git and version control. Every commit in Git is identified by a SHA-1 hash of its contents. The hash of a commit depends on the hash of its parent commit, which means the history is tamper-evident: you cannot quietly alter an old commit without changing all the hashes that follow it.
- Digital signatures. Rather than signing an entire document, signing software hashes the document and signs only the hash. This is faster and equally secure, because the hash uniquely represents the document.
- Hash tables in programming. The
dictin Python,HashMapin Java, andObjectin JavaScript all rely on hash functions internally to map keys to memory locations. These are non-cryptographic hashes optimized for speed, not security. - Content-addressable storage. Systems like IPFS and package managers (npm, pip) identify packages by their hash. If you know the hash, you know exactly what data you will get — no matter which server you download it from.
Why you cannot reverse a hash
This surprises many people: hashing is a fundamentally lossy operation. A SHA-256 hash is always exactly 256 bits, regardless of whether the input was one byte or one gigabyte. Information is discarded. Many different inputs could theoretically produce the same hash output (called a collision) — you just cannot find two that actually do for a well-designed algorithm.
What people sometimes call "decrypting" a hash is actually a dictionary attack or rainbow table lookup: precomputing the hashes of millions of common passwords or phrases, then checking whether a stolen hash appears in the table. This is why password hashing in production uses additional techniques — salting (appending a random value to the input before hashing) and slow hash functions like bcrypt or Argon2 — to make precomputed tables useless and brute-force attacks expensive.
A quick command-line example
On Linux or macOS you can hash a string directly in the terminal:
echo -n "hello" | sha256sumOutput:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
The -n flag tells echo not to append a newline — without it you would be hashing hello\n instead of hello, which gives a different result. This trips up many developers when comparing hashes between tools.
Choosing the right algorithm
A practical decision guide for new work:
- General-purpose integrity checks and digital signatures: SHA-256.
- TLS certificates, certificate transparency logs: SHA-256 or SHA-384.
- Password hashing: do not use any of the above — use bcrypt, scrypt, or Argon2, which are purpose-built to be slow.
- Legacy systems requiring MD5 or SHA-1: acceptable only if there is no security requirement and no attacker model — for example, a checksum on a file transferred over a trusted internal network.
- High-volume, non-security hashing (hash tables, deduplication): non-cryptographic functions like xxHash or MurmurHash are faster and fine for this purpose.
Generate a hash right now — free and private
Figro's Hash Generator runs entirely in your browser. Type or paste any text and get MD5, SHA-1, SHA-256, SHA-384, and SHA-512 hashes instantly. Nothing is ever sent to a server.
Open the free hash generator →Figro's guides are educational and independent. They are not security or legal advice. Some pages include affiliate links; if you purchase through them we may earn a commission at no extra cost to you.