Abusing an ECB cipher to extract unknown encrypted text

by Landon | Jul 11, 2024 | Leave a comment

This is a write up on challenges 12 and 14 of the cryptopals cryptography challenges.

I tried to solve this once a couple years ago, but I don’t think I understood the problem very well on the first go ’round. But, I get it now.

The premise of the challenges

In both challenge 12 and 14, they ask you set up an “encryption oracle.” I don’t know about you, but when I see the word “oracle,” I start thinking of mystical wizardry, fortune-telling, and sorcery … the kind you might find in the early 1990’s video game classic, Kings Quest VI.

But really, all they’re asking you to do is set up an encrypting function where:

  • a string (aka the MYSTERY-SUFFIX) is appended to whatever input is submitted.
  • the new combined inputs are encrypted using an ECB cipher.
  • the key used to encrypt them never changes, but it is unknown.
  • the MYSTERY-SUFFIX never changes.

Or, as they put it (with my own slight modification):

AES-128-ECB(your-string || MYSTERY-SUFFIX, random-key)

Challenge 14 is the same thing, but with one additional wrinkle:

  • An unknown-but-constant string is also prepended to any input.

Or, as they put it (with my own slight modification):

AES-128-ECB(random-prefix || attacker-controlled || MYSTERY-SUFFIX, random-key)

Given this function, the challenge here is to figure out what MYSTERY-SUFFIX is without knowing the key.

How to solve this problem

Let’s take a quick inventory of what we know:

  • The key doesn’t change.
  • The MYSTERY-SUFFIX doesn’t change.
  • There is no limit as to how many times we submit input to the encryption oracle. (That’s important. Not having any rate limiting makes breaking this much easier.)

There are also a few things that, although we may not know them, we can prove them with so much ease that it’s probably ok to take them for granted:

  • This is ECB encryption.
  • The block size is 16 bytes.
  • There are 256 possible values for each individual byte in MYSTERY-SUFFIX.
    • If you don’t understand why ^^^ is true, remember that a byte is a series of eight binary digits. The minimum value eight binary digits can hold is 0, and the maximum is 255.

Remember: ECB ciphers are weak because they are deterministic; if you don’t change the key, the same input will produce the same output. Patterns are also really easy to detect in ECB, which makes it easier to make educated guesses about what’s underneath. (Read this for more on ECB’s weaknesses)

Given the preceding, these are the steps for extracting the first byte of MYSTERY-SUFFIX from the oracle

  • Craft a SPECIAL INPUT one byte shorter than a full block.
  • Because you know that MYSTERY-SUFFIX will be appended to your SPECIAL INPUT, and because your SPECIAL INPUT is one byte short of a full block, and because you know MYSTERY-SUFFIX is appended to SPECIAL INPUT before encryption, you know that the last byte in the first block of the message is going to be the first byte of MYSTERY-SUFFIX when the oracle encrypts it.
  • Send SPECIAL INPUT off to the oracle and save the result. (You can save it in a dictionary where the 15-byte SPECIAL INPUT is the key if you want).
  • Now, take SPECIAL INPUT and add one more byte to it (which rounds out the block). Send it to the oracle and see if the first block of the encrypted result matches the result you got in the previous step.
    • If it does, you know what the first byte of MYSTERY-SUFFIX is.
    • If it doesn’t, try again with a different byte.
    • Repeat until you find a match.
      • Remember, there are only 256 possible bytes this first byte of MYSTERY-SUFFIX could be.
      • Therefore, the maximum number of times you’ll look for a match for an individual byte is 256 times.

Visualizing this process can also help in understanding.

Let’s say SPECIAL INPUT is the string 123456789012345. It doesn’t really matter exactly what it is as long as we know what it is. That means that the message will look something like this before it undergoes encryption. (The question marks represent MYSTERY-SUFFIX, and the combined message has been broken into blocks to make it easier to visualize.)

123456789012345? ??????????????? ??????????????? ???????????????

Notice that the last character in that first block is the first block in MYSTERY-SUFFIX.

So, then this message is encrypted. For the sake of argument, let’s say that just the first block of the encrypted result (we’ll focus on other blocks blocks later) comes out to look like this: abcdefghijklmnop

We know that our SPECIAL INPUT plus one byte of MYSTERY-SUFFIX made this first block come out like this. We don’t know what that byte is, but we know it’s something.

Because the encryption key is constant, and because the key operates on each individual block in isolation of the other blocks, we can iterate through all possible values of that final byte until we find a response from the oracle where the first block reads as abcdefghijklmnop.

Let’s say we discover that the character that does this is the letter R. Great. We have found the first character in the MYSTERY-SUFFIX.

What about the others? To find the others, first we shorten our SPECIAL INPUT input by one byte (now it’s 14 bytes instead of 15) and send it off to the oracle so that before encryption, the message looks like this.

12345678901234R? ??????????????? ??????????????? ??????????????

Notice: R is the first character of MYSTERY-SUFFIX. Rather than continue to represent it with a ?, I’m choosing to include the known character in its place.

Let’s say that the oracle spits this back as the first block for that input: zyxwvutsrqponmlk

Ok. We know the first byte is R, so how can we change our input such that we can discover the 2nd byte in the MYSTERY-SUFFIX? Well, if we take our 14-character SPECIAL INPUT and append R plus something else to it, then the problem is the same … we can rotate that final byte of the first block until the oracle spits out zyxwvutsrqponmlk in that first block. The fact that the actual MYSTERY-SUFFIX actually starts in block two is irrelevant because we’re only looking at block one.

Let’s say that when we append Ro to the 14-character SPECIAL INPUT, the first block of the encryption result reads zyxwvutsrqponmlk. Awesome. We now know the second character in MYSTERY-SUFFIX is o.

To find the rest of the first block of the mystery string, we just repeat this process, making SPECIAL INPUT shorter and shorter and shorter. Every time we do so, one more byte of MYSTERY-SUFFIX appears in the final position of the first block. This allows us to append what is known onto the end of SPECIAL INPUT and rotate the final byte through all 256 possibilities until we find a match on the first block of the encryption.

Once we’ve discovered what the entire first block of MYSTERY-SUFFIX is, we do the whole thing again, but instead of focusing on the first block, you look at the second block, appending what’s known to your SPECIAL INPUT and rotating that last byte through all 256 possibilities until you find just the right character.

For example, let’s say we figure out that Rollin'_down_the is the first block of MYSTERY-SUFFIX. If we go back to our 15-byte wide SPECIAL INPUT without appending anything to it, this is what we now know the message looks like.

123456789012345R ollin'_down_the? ???????????????? ???????????????

If we craft our SPECIAL INPUT now to include what we know to be the first block of the MYSTERY-SUFFIX, but focus on block two instead of block one, then the problem is essentially the same. We just rotate the last byte of block two through all 256 possibilities until we find a match. Let’s say that _ produces a match.

Then we go back to our 14-byte wide SPECIAL INPUT plus the known first block (Rollin'_down_the) plus what is known from block two of the MYSTERY-SUFFIX (_ so far) such that the new SPECIAL INPUT looks like this….

1234567890123Ro llin'_down_the_? ???????????????? ???????????????

… and we repeat, rotating the last byte of the second block through all 256 possible bytes until we find a match on the second block. As we shorten SPECIAL INPUT byte by byte, one more byte of the second block of MYSTERY-SUFFIX will show up in block two.

Once we uncover block two, we start over, repeating this process for block three, and block four, etc. until we extract the entire MYSTERY-SUFFIX out. We know we are done when we can no longer find any match.

Making it harder

In challenge 14, they add that wrinkle of prepending something constant but unknown to the hacker input before the encryption is performed.

Although it’s mind-bending at first glance, it really doesn’t matter what this prefix is. The only thing that must be done to get around it is figure out how much of a buffer you have to put before the SPECIAL INPUT such that SPECIAL INPUT starts in position 0 of a block. If you can do that, and if you know how many blocks are occupied by that prefix plus the buffer, the problem is the same, except instead of starting in block 0, you start in a block further down the line. There’s just an offset involved.

My solution

Here’s the function I wrote up in Java to pull out the MYSTERY-SUFFIX. Notice the inputs into this function. Commentary on what they are and how to find them can be found below the code.

/**
     * @param oracle           the oracle function of concern. takes in a byte array, returns a byte array
     * @param numPrefixBlocks  number of prefix blocks. used to determine an offset
     * @param prefixBuffer     a buffer that will push the beginning of the hacker input to the start of a block
     * @param numMysteryBlocks the number of mystery blocks you're trying to extract
     * @param blockSize        the block size
     * @return the extracted message
     */
    static byte[] performExtraction(Function<byte[], byte[]> oracle, int numPrefixBlocks, byte[] prefixBuffer, int numMysteryBlocks, int blockSize) {
        byte[] extracted = new byte[0];

        Map<Integer, byte[]> targets = new HashMap<>();
        for (int k = 0; k < numMysteryBlocks; k++) {
            int o = k + numPrefixBlocks; // o is our offset to the block we are interrogating

            byte[] block = new byte[blockSize];
            for (int i = 0; i < blockSize; i++) {

                final int len = blockSize - i - 1;
                final byte[] filler = ByteArrayUtil.concatenate(prefixBuffer, new byte[len]);

                //we can save these targets because
                // recomputing them on subsequent executions results in the same outcome
                // because ECB is deterministic. waste not cpu cycles
                var fullTarget = targets.computeIfAbsent(len, l -> oracle.apply(filler));

                var targetBlock = ByteArrayUtil.sliceByteArray(fullTarget, o * blockSize, blockSize);

                byte[] hackerInput = ByteArrayUtil.concatenate(
                        filler, //rounds out the prefix block, then gives us ( blockSize - i ) bytes in the next one
                        ByteArrayUtil.sliceByteArray(extracted, 0, extracted.length), // anything we got from previous rounds
                        ByteArrayUtil.sliceByteArray(block, 0, i), // anything we got so far on this round
                        new byte[1] //one more empty byte to round out the block
                );

                //validate that the hacker input less the prefix buffer is a multiple of the block size
                if ((hackerInput.length - prefixBuffer.length) % blockSize != 0) {
                    throw new CryptopalsException("the hackerInput didn't fill out a full block");
                }

                boolean found = false;
                for (int j = 0; j < 256; j++) {
                    hackerInput[hackerInput.length - 1] = (byte) j;
                    var result = oracle.apply(hackerInput);
                    var subjectBlock = ByteArrayUtil.sliceByteArray(result, o * blockSize, blockSize);
                    if (Arrays.equals(targetBlock, subjectBlock)) {
                        block[i] = (byte) j;
                        found = true;
                        break;
                    }
                }
                if (!found) {
                    if (k == numMysteryBlocks - 1) { //we're done
                        block = ByteArrayUtil.sliceByteArray(block, 0, i - 1);
                        break;
                    } else { // we're in trouble
                        throw new RuntimeException(String.format("could not find the match. k=%d, numMysteryBlocks=%d, offset=%d", k, numMysteryBlocks, o));
                    }
                }
            }
            extracted = ByteArrayUtil.concatenate(extracted, block);
        }

        return extracted;
    }

For Challenge 12, the number of prefix blocks is 0, the prefix buffer is an empty byte array, the number of mystery blocks is the length of the message when SPECIAL INPUT is an empty string, and the block size is 16.

For Challenge 14, the number of prefix blocks changes each time, so you have to programmatically figure out how many blocks the prefix occupies. This can be determined by submitting an empty string as SPECIAL INPUT and then submitting a single character in SPECIAL INPUT and finding which block is the first one among the two resulting encryption messages to be different. That block and all blocks before it are prefix blocks.

        //find the "break point" ... the starting point of the first block that is different
        // this block is where the prefix _ends_
        final var empty = Challenge14Oracle.speakProphecy(new byte[0]);
        final var polluted = Challenge14Oracle.speakProphecy(new byte[1]);
        Integer breakPointIndex = null;
        for (int i = 0; i < empty.length; i++) {
            if (empty[i] != polluted[i]) {
                breakPointIndex = i;
                break;
            }
        }
        if (breakPointIndex == null) {
            throw new RuntimeException("could not find break point");
        }
        int numPrefixBlocks = (breakPointIndex / blockSize) + 1;

The contents of the prefix buffer don’t really matter as long as it fills up a full block and doesn’t change after you determine its length. The appropriate length for the prefix buffer can be found by incrementally increasing the length of a buffer until the block where the prefix ends doesn’t change anymore. Once you find the length where adding another byte doesn’t result in change in the block where the prefix ends, you know how long the buffer length is.

        //figure out how many more bytes I need to add to this block where the prefix ends
        // in order to fill it. i do this by adding input until the block doesn't change anymore.
        int bufferLength = 0;
        var prev = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength]), breakPointIndex, blockSize);
        var next = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength + 1]), breakPointIndex, blockSize);
        while (bufferLength <= (blockSize * 2) && !Arrays.equals(prev, next)) {
            bufferLength++;
            prev = next;
            next = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength + 1]), breakPointIndex, blockSize);
        }

        if (bufferLength == (blockSize * 2)) {
            throw new CryptopalsException("could not determine buffer length. filled two full blocks without observing change");
        }

The number of mystery blocks is simply the length of the encrypted message minus the number of prefix+buffer blocks. Including the prefix buffer starts the MYSTERY-SUFFIX at the beginning of a block. If you know how many blocks are prefix+buffer blocks, you can figure out how many blocks are occupied by MYSTERY-SUFFIX by taking the total number of blocks in an encrypted message and subtracting the number of prefix+buffer blocks.

        var padded = Challenge14Oracle.speakProphecy(prefixBuffer);
        int numTotalBlocks = padded.length / blockSize;
        int numMysteryBlocks = numTotalBlocks - numPrefixBlocks;

Thanks for reading! Cheers!