SHA1 and MD4 Length Extension Attacks Explained
Continuing my series on the cryptopals challenges…
In section four, two of the challenges require you to get past a checksum test by spoofing a hash associated with a forged message. The idea is that if you can manage to pass a query string to an application (say a web application) that has been toyed with and provide a valid message hash with that manipulated query string, you can get other systems to do what you want. As an example, flickr was exposed as vulnerable to this attack in 2009.
Because of the algorithms that undergird them, SHA1, MD4, and (as I understand it), MD5 are susceptible to this type of attack. Even if the message is prepended by a secret key of unknown length before being hashed, it is possible to leverage common implementations of these algorithms to generate new hashes that will validate against manipulated strings. For this reason (and others), simple message authentication codes (MACs) produced by SHA1, MD4 and MD5 shouldn’t be trusted.
But how? To understand, you need to understand what happens in the underlying algorithm. In describing this, I’ll note that there are some meaningful differences between SHA1 and MD4, such as endianness, digest length, and the underlying processing method, but in the ways that matter for this attack, they’re the same. I’ll call the differences out if they’re important, but otherwise, just assume I’m speaking in terms of SHA1.
When a message is hashed, that message is converted mathematically into a code. With SHA1 and MD4 (the two algorithms in the challenge), that code is a hexadecimal character string 40 characters long, which is derived from a byte array 20 bytes long (or 16 bytes for MD4). In other words, the algorithm takes the message, uses it to manipulate a byte array, and at the end of all the math, that resulting byte array is the code. Importantly, this implies the algorithm is stateful.
If you look at the bouncycastle implementation of SHA1 (the one I used), you’ll notice five integer registers: H1, H2, H3, H4, H5
. Each integer occupies 32 bits or 4 bytes, and since there are five registers, that makes for 20 bytes stored in the digest’s state. This state variable is where the ultimate hash comes from. As the digest mathematically processes the message, the end result of each processing step is to manipulate these registers. But the registers are only manipulated once for every 512-bit/64-byte block, which begs the question: What if your message doesn’t have a length of a multiple of 64 bytes?
The algorithm will pad the final block and fill it out so that there are 64 bytes in the block to process. It does this in two meaningful ways:
- The first byte after the end of the message is always
10000000
. I’ll call this the “bitflag.” - The last eight bytes contain a count of the bits, or bitcount, (not bytes) in the message (and if there isn’t a full eight bytes available to hold it, then a whole new, empty, 64-byte block is appended). For example:
- if the message length is 20 bits, then the last byte will be
00010100
- if the message contains 1000 bits, then the last two bytes will be
00000011 11101000
- if the message contains 100,000 bits, then the last three bytes will be
00000001 10000110 10100000
- you get it
- if the message length is 20 bits, then the last byte will be
After the bitflag and the bitcount are placed in the final 64-bit block of the message, the final block is processed just like any other block. The register state then becomes the digest that is output as the final result of the hash, at which point, the registers are reset.
I’ll say that again: the final output of the algorithm is the state of the registers after the final block is processed.
Ergo, if you can manipulate the registers, you can pretend that a bitflag and a bitcount were part of your message to begin with, and then feed the algorithm more message bytes (which, of course, you control) which will ultimately result in a new MAC that can authenticate against a manipulated message.
To do this, step one is to manually un-reset the registers back to their final state before the message hash was output. To be clear, you’re not just setting the registers back to what they were after the final byte of the message is processed. What you’re doing is resetting the registers to their final state after the bitflag and bitcount were put in their places. (Cryptopals calls the message contents from the bitflag to the bitcount “glue padding,” so I will too.)
Once the registers are reset back to their final state, you feed the digest more message and get a new hash. What’s really going to happen is the digest will take your new message, append new glue padding to round out a block, and will process that new block to produce your new hash.
So now you have a new hash for a manipulated message. But, if you can’t replicate the message the hash is tied to, then the hash isn’t worth much.
To do this, you have to recreate what has now become the original message: prefix + message + glue padding + appendix. You know the message and the appendix (and their lengths), obviously. Whether or not you know the prefix doesn’t matter as long as you know, or can at least guess, its length. If you know those lengths, you can rebuild the glue padding and fill out a block with that padding before tacking on the appendix.
Here’s how I did it. The end result is a byte array that starts with a single 1 and has the bitcount packed into the final eight bytes. The length of the result is enough to round out a 64-byte block (or to finish off one block and append a new 64-byte block if there isn’t enough space in the block to pack the bitcount into the final eight bytes).
/**
* given a message, build the MD padding as close to the same
* way as possible
*
* the algorithm, best as i can tell, leads with a single bit and then over-writes
* the last bytes of the 512-bit block with the number of BITS in the message
*
* @param message the message
* @return the md padding
*/
public byte[] buildGluePadding(final byte[] message) {
//build a 512-bit block
byte[] block = new byte[BLOCK_SIZE];
//get the last 64 characters of the message
//or if it IS 64, then an empty byte array
final byte[] subMsg;
if (message.length == BLOCK_SIZE) {
subMsg = new byte[0];
} else if (message.length > BLOCK_SIZE) {
final int startPos = (message.length / 64) * 64;
subMsg = new byte[message.length - startPos];
System.arraycopy(message, startPos, subMsg, 0, message.length - startPos);
} else {
subMsg = message;
}
//copy the message bytes into the block
System.arraycopy(subMsg, 0, block, 0, subMsg.length);
int index = subMsg.length;
//make the next byte the flag byte
block[index++] = Byte.MIN_VALUE;
//have to allow for the bit count to take up to 8 bytes
if (index >= BLOCK_SIZE - BIT_COUNT_SPACE) {
block = ByteArrayUtil.concatenate(block, new byte[BLOCK_SIZE]);
}
//get the number of bits in the message
final long messageBitCount = (long) message.length << 3;
packTheBitCount(block, messageBitCount);
final int padLength = block.length - subMsg.length;
final byte[] returnValue = new byte[padLength];
System.arraycopy(block, subMsg.length, returnValue, 0, padLength);
return returnValue;
}
Note: This algorithm works for either SHA1 or MD4, but the method by which the bitcount is packed into the final block, packTheBitCount()
, is different between the two. Here’s the implementation for SHA1, and here’s MD4. The biggest difference (which was frustrating), is that MD4 is little-endian, meaning least significant bit first. SHA1 is big-endian, meaning least significant bit last.
Once you’ve got the glue padding, you can try to authenticate. The prefix is going to be prepended before being hashed, so the message to be submitted is message + glue padding + appendix
. Submit that with the hash. As your manipulated message is processed, block by block, it will eventually arrive at the block that contains your manipulated glue padding. If you do it right, the register state after that block is processed should be identical to the state you forced when you obtained your new hash.
From there, the algorithm behaves deterministically. It fills out the final block that contains ;admin=true
with new glue padding, then it processes the final 64-byte block, producing a hash. If the register state going into the processing of that final block was the same as it was when you overrode the register state, then the final result will be the same, and your message will authenticate.