Landon Hemsley

Abusing an ECB cipher to extract unknown encrypted text

Landon — Thu, 11 Jul 2024 18:05:19 +0000

This is a write up on challenges 12 and 14 of the cryptopals cryptography challenges.

I tried to solve this once a couple years ago, but I don’t think I understood the problem very well on the first go ’round. But, I get it now.

The premise of the challenges

In both challenge 12 and 14, they ask you set up an “encryption oracle.” I don’t know about you, but when I see the word “oracle,” I start thinking of mystical wizardry, fortune-telling, and sorcery … the kind you might find in the early 1990’s video game classic, Kings Quest VI.

But really, all they’re asking you to do is set up an encrypting function where:

a string (aka the MYSTERY-SUFFIX) is appended to whatever input is submitted.
the new combined inputs are encrypted using an ECB cipher.
the key used to encrypt them never changes, but it is unknown.
the MYSTERY-SUFFIX never changes.

Or, as they put it (with my own slight modification):

AES-128-ECB(your-string || MYSTERY-SUFFIX, random-key)

Challenge 14 is the same thing, but with one additional wrinkle:

An unknown-but-constant string is also prepended to any input.

Or, as they put it (with my own slight modification):

AES-128-ECB(random-prefix || attacker-controlled || MYSTERY-SUFFIX, random-key)

Given this function, the challenge here is to figure out what MYSTERY-SUFFIX is without knowing the key.

How to solve this problem

Let’s take a quick inventory of what we know:

The key doesn’t change.
The MYSTERY-SUFFIX doesn’t change.
There is no limit as to how many times we submit input to the encryption oracle. (That’s important. Not having any rate limiting makes breaking this much easier.)

There are also a few things that, although we may not know them, we can prove them with so much ease that it’s probably ok to take them for granted:

This is ECB encryption.
The block size is 16 bytes.
There are 256 possible values for each individual byte in MYSTERY-SUFFIX.
- If you don’t understand why ^^^ is true, remember that a byte is a series of eight binary digits. The minimum value eight binary digits can hold is 0, and the maximum is 255.

Remember: ECB ciphers are weak because they are deterministic; if you don’t change the key, the same input will produce the same output. Patterns are also really easy to detect in ECB, which makes it easier to make educated guesses about what’s underneath. (Read this for more on ECB’s weaknesses)

Given the preceding, these are the steps for extracting the first byte of MYSTERY-SUFFIX from the oracle

Craft a SPECIAL INPUT one byte shorter than a full block.
Because you know that MYSTERY-SUFFIX will be appended to your SPECIAL INPUT, and because your SPECIAL INPUT is one byte short of a full block, and because you know MYSTERY-SUFFIX is appended to SPECIAL INPUT before encryption, you know that the last byte in the first block of the message is going to be the first byte of MYSTERY-SUFFIX when the oracle encrypts it.
Send SPECIAL INPUT off to the oracle and save the result. (You can save it in a dictionary where the 15-byte SPECIAL INPUT is the key if you want).
Now, take SPECIAL INPUT and add one more byte to it (which rounds out the block). Send it to the oracle and see if the first block of the encrypted result matches the result you got in the previous step.
- If it does, you know what the first byte of MYSTERY-SUFFIX is.
- If it doesn’t, try again with a different byte.
- Repeat until you find a match.
  - Remember, there are only 256 possible bytes this first byte of MYSTERY-SUFFIX could be.
  - Therefore, the maximum number of times you’ll look for a match for an individual byte is 256 times.

Visualizing this process can also help in understanding.

Let’s say SPECIAL INPUT is the string 123456789012345. It doesn’t really matter exactly what it is as long as we know what it is. That means that the message will look something like this before it undergoes encryption. (The question marks represent MYSTERY-SUFFIX, and the combined message has been broken into blocks to make it easier to visualize.)

123456789012345? ??????????????? ??????????????? ???????????????

Notice that the last character in that first block is the first block in MYSTERY-SUFFIX.

So, then this message is encrypted. For the sake of argument, let’s say that just the first block of the encrypted result (we’ll focus on other blocks blocks later) comes out to look like this: abcdefghijklmnop

We know that our SPECIAL INPUT plus one byte of MYSTERY-SUFFIX made this first block come out like this. We don’t know what that byte is, but we know it’s something.

Because the encryption key is constant, and because the key operates on each individual block in isolation of the other blocks, we can iterate through all possible values of that final byte until we find a response from the oracle where the first block reads as abcdefghijklmnop.

Let’s say we discover that the character that does this is the letter R. Great. We have found the first character in the MYSTERY-SUFFIX.

What about the others? To find the others, first we shorten our SPECIAL INPUT input by one byte (now it’s 14 bytes instead of 15) and send it off to the oracle so that before encryption, the message looks like this.

12345678901234R? ??????????????? ??????????????? ??????????????

Notice: R is the first character of MYSTERY-SUFFIX. Rather than continue to represent it with a ?, I’m choosing to include the known character in its place.

Let’s say that the oracle spits this back as the first block for that input: zyxwvutsrqponmlk

Ok. We know the first byte is R, so how can we change our input such that we can discover the 2nd byte in the MYSTERY-SUFFIX? Well, if we take our 14-character SPECIAL INPUT and append R plus something else to it, then the problem is the same … we can rotate that final byte of the first block until the oracle spits out zyxwvutsrqponmlk in that first block. The fact that the actual MYSTERY-SUFFIX actually starts in block two is irrelevant because we’re only looking at block one.

Let’s say that when we append Ro to the 14-character SPECIAL INPUT, the first block of the encryption result reads zyxwvutsrqponmlk. Awesome. We now know the second character in MYSTERY-SUFFIX is o.

To find the rest of the first block of the mystery string, we just repeat this process, making SPECIAL INPUT shorter and shorter and shorter. Every time we do so, one more byte of MYSTERY-SUFFIX appears in the final position of the first block. This allows us to append what is known onto the end of SPECIAL INPUT and rotate the final byte through all 256 possibilities until we find a match on the first block of the encryption.

Once we’ve discovered what the entire first block of MYSTERY-SUFFIX is, we do the whole thing again, but instead of focusing on the first block, you look at the second block, appending what’s known to your SPECIAL INPUT and rotating that last byte through all 256 possibilities until you find just the right character.

For example, let’s say we figure out that Rollin'_down_the is the first block of MYSTERY-SUFFIX. If we go back to our 15-byte wide SPECIAL INPUT without appending anything to it, this is what we now know the message looks like.

123456789012345R ollin'_down_the? ???????????????? ???????????????

If we craft our SPECIAL INPUT now to include what we know to be the first block of the MYSTERY-SUFFIX, but focus on block two instead of block one, then the problem is essentially the same. We just rotate the last byte of block two through all 256 possibilities until we find a match. Let’s say that _ produces a match.

Then we go back to our 14-byte wide SPECIAL INPUT plus the known first block (Rollin'_down_the) plus what is known from block two of the MYSTERY-SUFFIX (_ so far) such that the new SPECIAL INPUT looks like this….

1234567890123Ro llin'_down_the_? ???????????????? ???????????????

… and we repeat, rotating the last byte of the second block through all 256 possible bytes until we find a match on the second block. As we shorten SPECIAL INPUT byte by byte, one more byte of the second block of MYSTERY-SUFFIX will show up in block two.

Once we uncover block two, we start over, repeating this process for block three, and block four, etc. until we extract the entire MYSTERY-SUFFIX out. We know we are done when we can no longer find any match.

Making it harder

In challenge 14, they add that wrinkle of prepending something constant but unknown to the hacker input before the encryption is performed.

Although it’s mind-bending at first glance, it really doesn’t matter what this prefix is. The only thing that must be done to get around it is figure out how much of a buffer you have to put before the SPECIAL INPUT such that SPECIAL INPUT starts in position 0 of a block. If you can do that, and if you know how many blocks are occupied by that prefix plus the buffer, the problem is the same, except instead of starting in block 0, you start in a block further down the line. There’s just an offset involved.

My solution

Here’s the function I wrote up in Java to pull out the MYSTERY-SUFFIX. Notice the inputs into this function. Commentary on what they are and how to find them can be found below the code.

/**
     * @param oracle           the oracle function of concern. takes in a byte array, returns a byte array
     * @param numPrefixBlocks  number of prefix blocks. used to determine an offset
     * @param prefixBuffer     a buffer that will push the beginning of the hacker input to the start of a block
     * @param numMysteryBlocks the number of mystery blocks you're trying to extract
     * @param blockSize        the block size
     * @return the extracted message
     */
    static byte[] performExtraction(Function oracle, int numPrefixBlocks, byte[] prefixBuffer, int numMysteryBlocks, int blockSize) {
        byte[] extracted = new byte[0];

        Map targets = new HashMap<>();
        for (int k = 0; k < numMysteryBlocks; k++) {
            int o = k + numPrefixBlocks; // o is our offset to the block we are interrogating

            byte[] block = new byte[blockSize];
            for (int i = 0; i < blockSize; i++) {

                final int len = blockSize - i - 1;
                final byte[] filler = ByteArrayUtil.concatenate(prefixBuffer, new byte[len]);

                //we can save these targets because
                // recomputing them on subsequent executions results in the same outcome
                // because ECB is deterministic. waste not cpu cycles
                var fullTarget = targets.computeIfAbsent(len, l -> oracle.apply(filler));

                var targetBlock = ByteArrayUtil.sliceByteArray(fullTarget, o * blockSize, blockSize);

                byte[] hackerInput = ByteArrayUtil.concatenate(
                        filler, //rounds out the prefix block, then gives us ( blockSize - i ) bytes in the next one
                        ByteArrayUtil.sliceByteArray(extracted, 0, extracted.length), // anything we got from previous rounds
                        ByteArrayUtil.sliceByteArray(block, 0, i), // anything we got so far on this round
                        new byte[1] //one more empty byte to round out the block
                );

                //validate that the hacker input less the prefix buffer is a multiple of the block size
                if ((hackerInput.length - prefixBuffer.length) % blockSize != 0) {
                    throw new CryptopalsException("the hackerInput didn't fill out a full block");
                }

                boolean found = false;
                for (int j = 0; j < 256; j++) {
                    hackerInput[hackerInput.length - 1] = (byte) j;
                    var result = oracle.apply(hackerInput);
                    var subjectBlock = ByteArrayUtil.sliceByteArray(result, o * blockSize, blockSize);
                    if (Arrays.equals(targetBlock, subjectBlock)) {
                        block[i] = (byte) j;
                        found = true;
                        break;
                    }
                }
                if (!found) {
                    if (k == numMysteryBlocks - 1) { //we're done
                        block = ByteArrayUtil.sliceByteArray(block, 0, i - 1);
                        break;
                    } else { // we're in trouble
                        throw new RuntimeException(String.format("could not find the match. k=%d, numMysteryBlocks=%d, offset=%d", k, numMysteryBlocks, o));
                    }
                }
            }
            extracted = ByteArrayUtil.concatenate(extracted, block);
        }

        return extracted;
    }

For Challenge 12, the number of prefix blocks is 0, the prefix buffer is an empty byte array, the number of mystery blocks is the length of the message when SPECIAL INPUT is an empty string, and the block size is 16.

For Challenge 14, the number of prefix blocks changes each time, so you have to programmatically figure out how many blocks the prefix occupies. This can be determined by submitting an empty string as SPECIAL INPUT and then submitting a single character in SPECIAL INPUT and finding which block is the first one among the two resulting encryption messages to be different. That block and all blocks before it are prefix blocks.

        //find the "break point" ... the starting point of the first block that is different
        // this block is where the prefix _ends_
        final var empty = Challenge14Oracle.speakProphecy(new byte[0]);
        final var polluted = Challenge14Oracle.speakProphecy(new byte[1]);
        Integer breakPointIndex = null;
        for (int i = 0; i < empty.length; i++) {
            if (empty[i] != polluted[i]) {
                breakPointIndex = i;
                break;
            }
        }
        if (breakPointIndex == null) {
            throw new RuntimeException("could not find break point");
        }
        int numPrefixBlocks = (breakPointIndex / blockSize) + 1;

The contents of the prefix buffer don’t really matter as long as it fills up a full block and doesn’t change after you determine its length. The appropriate length for the prefix buffer can be found by incrementally increasing the length of a buffer until the block where the prefix ends doesn’t change anymore. Once you find the length where adding another byte doesn’t result in change in the block where the prefix ends, you know how long the buffer length is.

        //figure out how many more bytes I need to add to this block where the prefix ends
        // in order to fill it. i do this by adding input until the block doesn't change anymore.
        int bufferLength = 0;
        var prev = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength]), breakPointIndex, blockSize);
        var next = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength + 1]), breakPointIndex, blockSize);
        while (bufferLength <= (blockSize * 2) && !Arrays.equals(prev, next)) {
            bufferLength++;
            prev = next;
            next = ByteArrayUtil.sliceByteArray(Challenge14Oracle.speakProphecy(new byte[bufferLength + 1]), breakPointIndex, blockSize);
        }

        if (bufferLength == (blockSize * 2)) {
            throw new CryptopalsException("could not determine buffer length. filled two full blocks without observing change");
        }

The number of mystery blocks is simply the length of the encrypted message minus the number of prefix+buffer blocks. Including the prefix buffer starts the MYSTERY-SUFFIX at the beginning of a block. If you know how many blocks are prefix+buffer blocks, you can figure out how many blocks are occupied by MYSTERY-SUFFIX by taking the total number of blocks in an encrypted message and subtracting the number of prefix+buffer blocks.

        var padded = Challenge14Oracle.speakProphecy(prefixBuffer);
        int numTotalBlocks = padded.length / blockSize;
        int numMysteryBlocks = numTotalBlocks - numPrefixBlocks;

Thanks for reading! Cheers!

Trying to see ECB-encrypted image shadows

Landon — Fri, 21 Jun 2024 15:14:20 +0000

It’s been a couple years since I started working on the cryptopals project. But, two years later, I am returning to this project hopefully to finish it all the way through.

Given the time that has elapsed since I started cryptopals in earnest, I thought it would be a good idea to go back through earlier challenges and solidify my understanding of what I wrote and how it works. In that sense, many of the blog posts that I have written previously have been very helpful.

But, if you notice, there isn’t very much that I wrote on the earlier challenges in the series. This is because I didn’t even have the idea to write about my journeys through cryptopals until I was finished with set no. 4.

Anyway, today I want to discuss challenge 7 and challenge 10. These two challenges require you to implement ECB and CBC symmetric encryption. In challenge 7, they tell you libraries are fine to use, but in challenge 10, they tell you to basically implement the thing yourself, leaning on libraries that provide ECB (I used javax.crypto).

If you research ECB much (wikipedia is probably sufficient for our needs here), you can see examples of images encrypted with ECB vs. CBC or other cipher methods. In some of these images, you can see shadows of the original. This obviously is not ideal in a world where you’re trying to conceal a message (which in this case is a visual message). If someone can discern your image after encryption, is the encryption worth much of anything? Probably not.

In any case, two years later, I decided that I wanted to see if my implementations of ECB would betray itself in the form of showing image shadows post-encryption. I’m happy(?) to say that yes, it does.

ECB Encryption on an image with uniform colors: Before and after

So, why? Why is it that an ECB-encrypted image shows shadows of the original, especially in circumstances where there are a lot of uniform colors?

To understand this, you need to understand a common term in computer science: deterministic (or determinism). Something in computers is deterministic when, given identical inputs, you get identical outputs.

You can think of this in terms of a manufacturing plant. If I go to a Frito-Lays plant, I can see a bunch of inputs that go into the process of creating Cool Ranch Doritos: spices, corn flour, MSG, the plastic materials for the bags, glue to seal the bags, water, etc., etc. All of these inputs go into a system, much of which is mechanized. At the end of that system, out pops a tightly sealed bag of the greatest snack food known to man. In a system like that, there may be a slight variation in the number of chips, but the bag is generally always full to about the same degree, and the bag sizes are always the same.

With computers, it’s much more rigid. Imagine if there was a plant that always produced the exact same bag of chips with no variation at all. That’s what computers give you. You give computer systems inputs, and if they are programmed deterministically, they will always give you the same outputs.

What makes ECB so easy to break is that it has this deterministic property. You give it the same input, it will give you the same output everytime. For example, if I give an ECB encryption algorithm a series of three white pixels from an image, it transform it into a blue, green and yellow pixel. and it will do that for every series of three white pixels. So, if you have an image with several white pixels in a row, a pattern quickly and easily emerges (as you can see above).

This property of ECB actually is sort of implied in what the acronym ECB stands for: Electronic Code Book. This hearkens back to the days before computers when secret messages were encrypted using code books that were clandestinely distributed to parties between whom messages needed to be sent. One party would change words, numbers, letters, or phrases according to the code book. The other would receive the message, and then would undo that encryption by changing known codes with their original meangings.

The ECB algorithm is very similar to this. Here’s essentially what it does:

Split your message into blocks of a certain length (which is why it’s called block cryptography). In this case, the block length is 16 bytes (or 128 bits).
Take a user-supplied key, and transform a block using a reversible encryption algorithm (cryptopals suggests AES, the implementation of which is beyond the scope of this post) with the key as an input to that transformation.
Repeat for every block in the message.

Then, when you want to decrypt your message, you do the same thing, but instead of applying the encryption, you reverse it. As long as the key is the same, you end up with the same original message.

There are other cipher modes that do this also, but in a way that inject pseudo-randomness into the encryption such that, given an image, at least the message isn’t nearly so discernible post-encryption. (CBC is one such approach, and it’s the subject of challenge 10).

The same image subjected to CBC encryption

Hopefully you learned something interesting! Thanks for stopping by.

How I set up my own private, home-based VPN

Landon — Fri, 10 May 2024 16:43:15 +0000

First off, if you’ve ever visited my site before, I just want to take a moment to thank you for visiting, and for your readership. There was a time a couple years ago where I would post to this blog monthly. I obviously haven’t written on this blog for a couple years. There is a good reason for this, but I won’t bore you with the details for now. I will say something about this at the end of this post, which will probably lead into a future post where I can elaborate in greater detail how I’ve been spending my time over the last couple years.

My previous efforts, especially the studying and work that I did about cryptography, have given me a new appreciation for the virtues of data privacy. If the product is free, you’re the product. The exploits of Google, Apple, and other large Silicon Valley companies with respect to their customers’ data are well known. I don’t trust these companies, yet have had some of my most sensitive, personal documents saved in their cloud drive offerings.

So, some months ago, I started taking steps to move my personal data away from big tech and into self-hosted technologies, or at least into firms that are more focused on privacy and respectful of their customers. In the extreme, it’s possible to stand up a server rack in your own home and serve everything yourself. But this requires a lot of time, energy, effort and money, all of which are relatively scarce commodities for a full-time professional. But, there are some things you can do that aren’t quite so severe.

Among the first thing I looked at was getting as much sensitive data off of cloud-based office software services, namely google drive and dropbox. For this, I bought a Synology network-attached storage (NAS) drive and migrated all my personal documents off the cloud and into the NAS.

When I bought my NAS, I learned that it’s possible to use Tailscale to set up a personal VPN among my devices. This service is useful because it allows me to connect to any of my devices anywhere in the world in a secure, encrypted manner as if we were all on the same local network. This appealed to me. If I was going to host all my documents on my NAS yet retain the convenience of being able to access them anywhere in the world, the ability to set up an encrypted, personal VPN seemed like a good choice for that.

With that long-winded introduction, I now approach the meat and potatoes of this post (thanks for sticking with me). Tailscale is a great service, but they require you to use a big-tech (Apple, Google, Microsoft, Facebook, oh my!) identity provider (IdP) to sign up for their service. OR, if you have your own identity provider, you can integrate with them using OpenID Connect (OIDC).

Setting up my own personal identity provider

It’s worth taking a moment to explain what an identity provider actually does since even to software professionals it’s not always obvious or easy to understand.

In online applications, login security is one of the more complicated problems out there. Some even call it a nightmare. Login security forces you to prove you are who you say you are through one of three methods:

You know something (usually a password)
You have something (like a temporary one-time password [think google authenticator], a smartphone [think one-time text codes] or a yubikey … search for it if you don’t know what a yubikey is)
You are something (biometric authentication … fingerprint, eye scan, face scan)

Obviously, the most common of these methods is through password authentication. Because of security concerns, many firms are now asking you to set up multiple methods of authenticating that you are who you say you are. Increasingly, it’s no longer enough to just know a password. You have to have that second factor to actually log in and gain access to whatever you’re trying to access.

Like I said, it’s a problem, and one that Tailscale explicitly decided they didn’t want to deal with. So they require you to have a system that will essentially vouch for you.

For 95-plus-percent of the population, using a Facebook or a Google login is going to be enough. But like I said, I’m trying to reduce my dependence on big tech. So, what to do? Well, use the open option of course!

But then what? How does one go about setting up an identity provider? And how does said identity provider end up vouching for you as someone who wants access to this sytem? Well, it’s a fun little dance that is illustrated pretty well by this diagram sourced from Zitadel.com.

In this diagram, you see the User, the Application, the Authorization Server, and the API. In my use case, I am the user (as in me, flesh and blood, not some system). The Application is Tailscale (their web-based service, not the VPN itself). The Authorization Server is the same thing as the identity provider, and the API (or Resource Server) is the VPN itself.

Walk through this diagram. In order to access my VPN, I would need to open up Tailscale and follow prompts to log in. At that point, Tailscale points me to my IdP who goes through the login process. If I fail to prove I am who I say I am (via password or whatever other method of authentication I have set up with that IdP), then access is denied. If I am able to authenticate with the IdP, though, then the IdP will serve up an authorization code and client authentication token to Tailscale, which will confer a bit more with Tailscale to learn more about the scope of access, etc., and obtain an access token. It’s this access token that ultimately will grant me access to the VPN service administered by Tailscale.

That’s all great… but where am I supposed to get an IdP that isn’t big tech?

Zitadel

I have good experience with Auth0 as an IdP for a number of projects, both personal and professional. It’s very useful, and it follows the same authentication pattern illustrated above, so I was initially inclined to use them, except that they are owned and managed by Okta… big tech. So, for this use case, Auth0 was out.

I looked at potentially buying hosting to run my own Keycloak service, but a friend of mine who has had to solve this problem a number of times spoke in glowing terms about Zitadel. They have their own self-hosted option (i.e., you can run it out of a docker container on your own server if you want), but you can also use the software on their servers. I didn’t want to spend money on hosting, and Zitadel isn’t big tech, so I chose to use Zitadel’s free tier.

After getting in, I set up a “project” that would function as the IdP for my Tailscale network. All that means is that I set something up in Tailscale that would function as an abstract representation of the VPN that I wanted to set up. If I wanted access to the VPN, I would need to log in against that project with my username, password, and one-time password.

Although initially disorienting, I ultimately was able to work out how to configure Zitadel such that a third-party system (Tailscale) would be able to use the Zitadel project as an IdP. Part of that included obtaining a key and a secret from Zitadel which would be later put into Tailscale during signup. With all this in place, I was ready, except for one thing…

Webfinger

In order to set up Tailscale with OIDC, the first thing they require is a webfinger endpoint.

Webfinger is a standardized web protocol that provides systems with information about known individuals/users under that domain and where these users are authenticated. I won’t illustrate in its entirety how I managed to set up a Webfinger endpoint, but to sum up, I used open source software on a domain that I control to meet Tailscale’s requirements. When I look up my user account on Webfinger.net, my Webfinger endpoint will report the location of my OIDC issuer, which is my Zitadel project.

Hooking it all up

With the Webfinger endpoint up and an IdP in place. Now, I was ready to spin up my Tailscale VPN. I input all the required keys and secrets from my IdP into Tailscale. Tailscale reached out to Zitadel and asked about me. I was redirected to Zitadel to log in. I logged in and authenticated with Zidatel. At that point, I was sent back to Tailscale and granted access to my brand new VPN.

Success!

Where I have been

In January of 2022, I was approached by some family members and persuaded to work on a CRM project to help my mother with her business. She was struggling to manage her clientele and needed simple software tools to help her out. This project was presented to me as a business opportunity because my mother is among the most connected people I have ever known, and she works with hundreds of people who needed some quality CRM tools.

For the next two years, I spent almost all my free time, with some exceptions, on this CRM project. In April of this year, I decided to stop. Various aspects of the project were not what I had hoped or expected, and the best choice for me was to stop.

This was hard for me, first because family was and is involved. I didn’t want feelings to get hurt and didn’t want to alienate my family. Also, it was especially hard because of all the hours I had spent on the project. I spent roughly 1,200 hours working on this project, and it was no longer possible for me to continue unless things changed.

Since I stepped away, I am pretty happy to say that others who are still involved in the project seem to be continuing. I’m happy that they have continued forward. If, at some point, circumstances change again, then I might return. In the meantime, from my perspective, the project is just in maintenance mode. I’m not working on it, even if I have to provide limited support to those who are.

Anyway, because I’m not doing that, I am now redirecting my efforts to a great many other things that I had put off in the interim. This post is, in part, a result of that decision. I never would have had the chance to set up my Tailscale network or write this post if I hadn’t done so. I would have been preoccupied with that project instead.

Again, thanks for stopping by and reading. I appreciate it.

Bleichenbacher ’06 RSA Signature Forgery: What they assume you know

Landon — Fri, 17 Dec 2021 14:45:21 +0000

In 2006, Daniel Bleichenbacher shared a discovery in an evening session at a cryptography conference: Several implementations of RSA-based PKCS 1 v 1.5 cryptographic signature verification were fatally flawed and susceptible to signature forgery.

It is as bad as it sounds. The sad part: The flaw in the signature verification algorithm is that the signature submitted for validation is trusted too much. Any engineer worth his salt knows you should never trust user input.

This blog post is directly tied to cryptopals challenge 42 in which you are asked to exploit a still-existing weakness (or at least still-existing up to 2016 … I would hope it’s fixed now) in several RSA cryptographic signature verification implementations. But they don’t give you a lot of background on how to construct a signature in the first place. Some parts are obvious; many are not.

The algorithm

If you google this subject long enough, you’ll run into something that looks like this in lots of different places.

00 01 FF FF ... FF 00 ASN HASH GARBAGE

This format (sans GARBAGE) is outlined in RFC 2313. (Spend some time there… it will help). The signature algorithm, as outlined there, is pretty straightforward. Using the public RSA key (not the private key), follow these steps:

Use a hashing algorithm to hash a message. SHA1 is ok, but you could use MD4, MD5, SHA256, SHA512, or whatever you want
Encode that message in ASN.1 format following a specific encoding scheme (as outlined below)
Pad that octet string out to the width of the RSA public modulus (aka n), starting with a byte of 00, then a byte of 01, then k bytes of FF, followed by another 00 byte, then the ASN.1 encoded octet string.
RSA-encrypt that octet string using the public key (not the private key).

Signature verification is basically the same, but backwards:

RSA-decrypt the submitted signature using the private key (not the public key).
Parse the decrypted octet string, verifying and validating the padding scheme.
Seek the beginning of the ASN.1 encoded octets.
Using the hash algorithm specified in the ASN.1 encoded octets, hash the message submitted for signature verification.
Compare the resulting hash with the hash included in the signature. If they match, then the signature is valid.

ASN.1 Formatting

If you’re a n00b like I was, the first questions are “What in the blazes is ASN.1?” and “How am I supposed to encode something in it?”

The answer: Don’t do it manually. There are libraries available all over the internet that make it easy. You just have to know how to use them. I used bouncy castle.

In RFC 2313, the encoding schema for the ASN.1 encoding is outlined as follows

   DigestInfo ::= SEQUENCE {
     digestAlgorithm DigestAlgorithmIdentifier,
     digest Digest }

   DigestAlgorithmIdentifier ::= AlgorithmIdentifier

   Digest ::= OCTET STRING

AltorithmIdentifier is defined in RFC 5280 (X.509) as follows.

   AlgorithmIdentifier  ::=  SEQUENCE  {
        algorithm               OBJECT IDENTIFIER,
        parameters              ANY DEFINED BY algorithm OPTIONAL  }

People the world over have written libraries to do this. Here’s a how I implemented the encoding. (It seems simple, but boiling this down to knowing which library objects to use was not obvious and took a lot of searching and reading).

The object identifier is a defined constant elsewhere in the library. It was just a matter of finding it and knowing which one to use.

    public byte[] encodeHashToAsn1SignatureFormat(final byte[] hash, final ASN1ObjectIdentifier hashAlgo) {
        ASN1Sequence s1 = new DERSequence(new ASN1Encodable[] {
                new AlgorithmIdentifier(hashAlgo, DERNull.INSTANCE),
                new DEROctetString(hash)
        });
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        s1.toASN1Primitive().encodeTo(out);
        return out.toByteArray();
    }

Small e and poor padding validation

Like I said … the fatal flaw in many RSA signature algorithms is that they trusted the submitted signature too much. Specifically, they didn’t validate the length of the signature to make sure it was as long as the RSA public modulus, and didn’t make sure that the content of the signature is right justified. That means that if you have a signature that nominally follows the signature encoding format of …

00 01 FF FF FF ... FF FF 00 ASN HASH

… then you can throw whatever you want (GARBAGE) after HASH, and the signature will still validate.

This is sort of a big deal. Recall from this explainer that RSA encryption is based on modular exponentiation. If you have a small exponent and a large modulus, you run the risk that a message might not be numerically large enough to wrap around the modulus after exponentiation. This being the case, all someone would have to do to forge a signature is come up with a figure that when exponentiated follows the format:

00 01 FF FF ... FF FF 00 ASN HASH GARBAGE

This is hard with larger exponents. It’s almost trivial when e=3. Why? All you have to do is build a message, take the cube root of that figure, and round up.

Why round up? Let’s be honest … you’re probably not going to be crafting a message that when translated into a large integer is a perfect cube. Computers are good at lots of things. Computing roots and computing prime factors of large figures are not among those things. But you don’t need a perfect cube anyway because you don’t care what comes after the HASH, right?

If you take a cube root of your crafted message and round up, the least significant (read: the right-most) bytes will be different than where you started, but the most significant (read: the left-most) bytes will stay the same. If you’re dealing with a padding validator that doesn’t validate that HASH is right-justified, you’ve won.

Here’s my implementation of an RSA signature “forger.” Pretty simple, pretty scary that it works so well.

The alternative, and why I didn’t pursue it

Hal Finney, in his original summary of this exploit, detailed that signatures could also be forged by treating everything before GARBAGE as if it was the result of (A – B)³. He loosely outlined a way that a message could crafted such that when cubed would follow the same basic pattern.

I saw this as needlessly complex. And, I already had a method written to find cube roots (and other roots) of large integers. I didn’t see a reason to bang my head against the wall when I was already 90% of the way there.

However, this is definitely a problem to revisit in the future.

Thanks for reading! -LH

RSA for those who aren’t number theorists

Landon — Fri, 29 Oct 2021 12:45:28 +0000

I just finished cryptopals challenge 39, in which I had to implement RSA.

For me, it wasn’t enough for me to just implement the RSA algorithm. I sort of needed to understand a bit about the underlying number theory. I say that because I’ve faced instances in the past where a typo or error in a cryptopals challenge description threw me off the trail for days or weeks. So, rather than bang my head against a wall, I took as deep a dive as I needed to understand enough about the math behind this algorithm to make sense of it all.

I still don’t know if I’m all the way there. I am not a math novice, but I’m definitely no number theorist. In any case, here’s a best attempt to try to explain this in a way that will make sense to those of us who are also not mathematicians but have need to learn about how RSA is supposed to work mathematically.

TL;DR

Here are the steps to generate an RSA key.

Select two prime numbers p and q. Multiply them to get n.
Using modulo of n, Euler’s theorem tells us a^kφ(n)+1≡ a mod n if a and n are coprime. (To make sure of this, we use very large prime numbers p and q to determine n.)
We can substitute kφ(n)+1 with ed provided that we select an e that is coprime with φ(n).
φ(n) = (p-1)(q-1)
In order for e to be coprime with (p – 1)(q – 1), we select an e that is prime and greater than or equal to both p and q.
Alternatively, we check to ensure prime number e is not a factor of φ(n) and keep picking different primes for e until we find one that isn’t a factor of φ(n). This will make e coprime to φ(n).
Once we’ve selected an e, we know that a modular multiplicative inverse must exist because e and φ(n) are coprime; we find the inverse d using the extended euclidean algorithm.
The lock is [e, n] and the key is [d, n].
Congratulate yourself!

What in the….?

We are going to go through the theory behind these steps in more detail. We’ll start with what invertibility means, which will lead us into the notion of modular multiplicative inverses. Then we’ll discuss φ(n), called simply the “phi function,” look at phi functions of prime numbers and products of prime numbers, and we’ll finish with Euler’s theorem. Then, with that foundation, we’ll select a key pair that meets all the criteria that will allow RSA to work.

There are links spread throughout this post, but as a favor, here are some of the sources that helped me understand this. A lot of them have more number theory than I think the average joe needs to understand to make this work. If these don’t help well enough, feel free to use a search engine. The info about how this works is everywhere. Keep grinding, and it will click eventually.

Prime Numbers and RSA by Computerphile

RSA by Eddie Woo Part One and Part Two

Encryption and Huge Numbers by Numberphile

RSA Encryption in 5 minutes

Jeremy Kun Math x Programming June 2011

Invertibility

First things first: In order for an encryption scheme to be worth anything, it has to be invertible (or, someone somewhere has to be able to decrypt an encrypted message). If you can’t decrypt an encrypted message, what’s it worth? Nothing.

RSA is inverted using modular exponentiation. That is, exponentiate a figure, divide it by another figure, and find the remainder of that division to encrypt. Repeat using different figures to arrive back at the beginning.

(Note: You’ll notice in the equations that follow, we don’t speak in terms of equality. We speak in terms of congruence. What’s congruence? Read the section on modular arithmetic here.)

Encryption: M^e mod n ≡ C

Decryption: C^d mod n ≡ M

If you please, we can make a substitution for C and get the fundamental expression that undergirds RSA.

M^ed mod n ≡ M

Or, as you will see just about anywhere if you search around…

M^ed ≡ M mod n ⟵ The golden formula. If you get lost, remember that this is ultimately what we’re after.

Modular multiplicative inverse

Notice that because we’re talking congruence, we can replace ed with 1 and it’s still true.

M¹ ≡ M mod n

There’s a special term for a pair of integers that when multiplied together mod n is congruent to 1: Modular multiplicative inverse. Or, we are after a d that is the modular multiplicative inverse of e and vice-versa.

There’s a rule about modular multiplicative inverses. For any two figures a and n, a will have a modular multiplicative inverse b if an only if a and n are coprime.

We must define coprime: A number a is coprime to another number n if they have no common factors.

The numbers don’t have to be prime themselves. But if they were, they would by definition be coprime since they would have no common factors.

All this to say that if we’re going to find any pair e and d that will satisfy our golden formula, it’s going to involve some sort of modular multiplicative inverse. Once we have found a pair of figures that satisfy our golden formula, then we have found an asymmetric encryption key. The lock is [e, n] (power of e, mod of n), and the key is [d, n] (power of d, mod of n).

Now, the question is this: how do we find e and d? We’re not there yet. Hang in there.

Phi function

I have to lay another piece of ground work. We need to talk about the phi function, or φ(n).

φ(n) is a special number. φ(n) is the quantity of numbers that are coprime with n, meaning how many numbers greater than or equal to one that share no common factors with n.

As it happens, calculating φ(n) is trivial for prime numbers. Think about it. If a number is prime, then by definition, it has no factors other than 1 and itself. One is a factor of everything, so we can ignore it. Therefore, for a prime number p…

φ(p) = p – 1

Phi function of products of primes

Next step: what if we have two prime numbers p and q and want to figure out what φ(pq) is? (I know this seems random, but stay with me. It’s important.)

With this one, multiply the phi-function of each prime together to find the phi of their product. The proof is a bit over my head. If you want to read the proof, go here.

φ(pq) = φ(p)φ(q) = (p – 1)(q – 1)

For RSA, pq = n, or in other words, n is the product of two primes p and q, and φ(n) is the product of the phi function of each of its prime factors.

Let’s work a simple example with two primes: 3 and 5. Here are all the numbers from 1 to 15 with multiples of 3 and 5 bolded. Because the bolded figures are multiples of our prime numbers, they are not coprime with 15.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Count the number of non-bolded figures. There are 8: {1, 2, 4, 7, 8, 11, 13, 14}.

φ(15) = φ(3)φ(5) = 2 * 4 = 8

Seems good to me.

Euler’s theorem

Last sidetrack before we find e and d, I promise.

Once upon a time a guy named Euler came up with a theorem that says that if you have two numbers a and n that are coprime, then the following is true. If you want to read up on why or how you can go here or here.

a^φ(n) ≡ 1 mod n

This can be toyed with. Let raise both sides to power k and simplify.

a^kφ(n) ≡ 1^k mod n ⟶ a^kφ(n) ≡ 1 mod n

Further, let’s multiply both sides by a and simplify.

a * a^kφ(n) ≡ a * 1 mod n ⟶ a * a^kφ(n) ≡ a mod n ⟶ a^{kφ(n) + 1} ≡ a mod n

Let’s call that out again, because it’s important

a^{kφ(n) + 1} ≡ a mod n

Does this look at least a little familiar? Golden formula anyone?

M^ed ≡ M mod n

Therefore, if we can find two figures e and d such that the following is true, we have found our e and d.

ed = kφ(n) + 1

Choosing e… finally

Let’s make an assumption that there is a pair of figures e and d that satisfy this congruence. (Aside: recall from above that for this to be true, e and φ(n) have to be coprime.)

ed ≡ 1 mod φ(n)

Subtract one from both sides.

ed – 1 ≡ 0 mod φ(n)

If that’s true, then φ(n) divides evenly into ed – 1. Or, in other words, ed – 1 will be equal to some multiple of φ(n). We can call that multiplier k.

ed – 1 ≡ 0 mod φ(n) ⟶ ed – 1 = kφ(n)

At this point, add one to both sides again.

ed = kφ(n) + 1 ⟶ This is the equation we are after!

Therefore, given a certain condition, we can straight substitute ed into Euler’s Theorem, and this gives us our golden formula.

What are those conditions? As stated before, e and φ(n) have to be coprime.

How can we guarantee that? φ(n) = (p – 1)(q – 1). Recall that p and q are prime numbers (ideally, really large prime numbers). Choose a 3rd prime for e that is greater than (p – 1) and (q – 1), and you’ll be guaranteed to have it be coprime with φ(n), because the only way any prime number figure would be a factor of φ(n) is if it divides evenly into φ(n). If that prime number is greater than the figures we multiplied to get φ(n), then it will definitely be coprime to φ(n).

Alternatively, we can just pick random primes, small or large, and see if a chosen prime is a factor of φ(n). If it is, we have to pick again. We keep picking until we find one that is coprime to φ(n).

So to sum up, as long as e and φ(n) are coprime (no common factors), then there will absolutely be a d that makes the equation ed = kφ(n) + 1 true for some value k. The bounds for e are that it has to be greater than one and less than (p-1)(q-1), with the caveat that it cannot be a factor of (p-1)(q-1). The easiest way to ensure that is to pick a prime number that is greater than p and q.

Solving for d

Now that we have an e, we have to find d, which is the modular multiplicative inverse of e mod φ(n). Once we have e, d, and n, we are finished!

There’s a defined algorithm for this: The extended euclidean algorithm. I’m not going to explain it here because, honestly, it’s over my head. There is psuedocode on wikipedia that makes it pretty simple and straightforward to implement.

Interestingly, java comes with an implementation of this algorithm already attached to BigInteger. It’s the modInverse function. So, we already have what we need in order to determine the inverse of e mod φ(n). This makes any homebrew implementation pretty irrelevant in java, but the challenge invites us to implement it anyway. Here’s my implementation.

  /**
     * find the modular inverse of a mod n, which we call t
     * an implementation of the extended euclidean algorithm
     * sourced from wikipedia
     *
     * this is functionally equivalent to {@link BigInteger#modInverse(BigInteger)}
     *
     * @param a the number
     * @param n the modulus
     * @return the result, which I am calling t
     */
    public BigInteger invMod(final BigInteger a, final BigInteger n) {
        BigInteger t = BigInteger.ZERO;
        BigInteger nextT = BigInteger.ONE;
        BigInteger r = n;
        BigInteger nextR = a;

        while (nextR.compareTo(BigInteger.ZERO) != 0) {
            var q = r.divide(nextR);
            var tempT = t;
            t = nextT;
            nextT = tempT.subtract(q.multiply(nextT));

            var tempR = r;
            r = nextR;
            nextR = tempR.subtract(q.multiply(nextR));
        }

        if (r.compareTo(BigInteger.ONE) > 0) {
            throw new ArithmeticException("a is not invertible");
        }

        if (t.compareTo(BigInteger.ZERO) < 0) {
            t = t.add(n);
        }

        return t;
    }

Generating a key

Remember, the public key (or lock) is [e, n] and the private key (or key) is [d, n]. Let’s do a quick recap on the steps it takes to find this key.

Select two prime numbers p and q. Multiply them to get n.
Using modulo of n, Euler’s theorem tells us a^kφ(n)+1≡ a mod n if a and n are coprime. (To make sure of this, we use very large prime numbers p and q to determine n.)
We can substitute kφ(n)+1 with ed provided that we select an e that is coprime with φ(n).
φ(n) = (p-1)(q-1)
In order for e to be coprime with (p – 1)(q – 1), we select an e that is prime and greater than or equal to both p and q.
Alternatively, we check to ensure prime number e is not a factor of φ(n) and keep picking different primes for e until we find one that isn’t a factor of φ(n). This will make e coprime to φ(n).
Once we’ve selected an e, we know that a modular multiplicative inverse must exist because e and φ(n) are coprime; we find the inverse d using the extended euclidean algorithm.
The lock is [e, n] and the key is [d, n].
Congratulate yourself!

Hopefully it makes more sense now!

Thanks for reading! -LH

Secure Remote Password Demystified

Landon — Thu, 16 Sep 2021 13:49:59 +0000

Secure Remote Password (SRP) is a protocol by which a user in a system is able to log in to that system without the system ever knowing or storing the user’s password.

Consider this description of the SRP protocol from cryptopals challenge 36:

Replace A and B with C and S (client & server)

C & S
 - Agree on N=[NIST Prime], g=2, k=3, I (email), P (password)
S
 - Generate salt as random integer
 - Generate string xH=SHA256(salt|password)
 - Convert xH to integer x somehow (put 0x on hexdigest)
 - Generate v=g**x % N
 - Save everything but x, xH
C->S
 - Send I, A=g**a % N (a la Diffie Hellman)
S->C
 - Send salt, B=kv + g**b % N
S, C
 - Compute string uH = SHA256(A|B), u = integer of uH
C
 - Generate string xH=SHA256(salt|password)
 - Convert xH to integer x somehow (put 0x on hexdigest)
 - Generate S = (B - k * g**x)**(a + u * x) % N
 - Generate K = SHA256(S)
S
 - Generate S = (A * v**u) ** b % N
 - Generate K = SHA256(S)
C->S
 - Send HMAC-SHA256(K, salt)
S->C
 -Send "OK" if HMAC-SHA256(K, salt) validates

This is supposed to be a simple summary of the exchanges between a client and server to securely authenticate a user. It’s a lot to take in, and it’s not super intuitive what is going on or how it’s supposed to work.

Here’s my best shot at summarizing this.

First, secure Remote Password (SRP) mostly concerns authentication ex-post-facto. It doesn’t concern registration, except for one thing: the server needs to have some way to authenticate that a user is who he claims to be without actually having the password. To do that, a server needs a “password verifier.” This is v.

Here’s where cryptopals confused me: They indicate in their challenge summary that the server is supposed to build v and salt. Well that makes no sense. If the client sends the plaintext password over the network in an unencrypted manner to begin with, what’s the point? You’ve already given away the game at that point. Solution: Have the client do it instead. The client comes up with v and salt and sends both of these values to the server for storage as part of a registration. Wikipedia agrees with me (or at least did when I wrote this).

From that point on, it’s really not too bad. The user is registered. When it comes time to actually log in, how is it that the client can prove he is the user he claims to be? Math.

The client comes up with A, an ephemeral key based on a one-time, random, private-key value a. A and the username get sent to the server. The server responds with the previously saved salt and an ephemeral public key B based on both v and another one-time, random, private-key value b. They each do math to come up with a new number S independently. If the numbers match, then it’s considered proven that the client is who he claims to be, and authentication is completed.

Now you’re thinking, “That’s nice, but that description doesn’t show how the math adds up. Can you show me?” I can do my best. Again, wikipedia has this math pretty awesomely summarized. One major difference between me and them, though, is that I’m going to retain the mod operation rather than condensing it down. Admittedly, it’s unwieldy, but in languages like java, it’s important to understand these operations can be done with BigInteger#modPow and not just BigInteger#pow. So, I kept the mod operations in this math. If you want to see a version without, check wikipedia.

Ok. Here we go…

There is a desired end value S that both the client and server should be able to independently calculate:

S = (g^b mod N)^{(a + ux)} mod N

Client –> Server

Client: I want to log in. Here is my username and a public key A.

Server

The server knows k, g and N. It also knows its own key b. It just received a username from the client and can retrieve a previously saved v from storage. It also can build u with both public keys (as follows).

u = SHA256(A|B) <— This means “A concat B hashed into SHA256 and converted to an integer”

A = g^a mod N

v = g^x mod N

With these values, we can compute a value S that is mathematically equal to the prior definition of S.

S = [A(v^u mod N])^b mod N

substitute A and v for their values

S = [(g^a mod N)(g^x mod N)^u mod N]^b mod N

simplify by reshuffling the exponents … (base^z)^y = base^zy

S = [(g^a mod N)(g^ux mod N)]^b mod N

simplify by reshuffling exponents again … base^z * base^y = base^(z+y)

S = [g^{(a + ux)} mod N]^b mod N

swap the exponents … (base^z)^y = base^yz = base^zy = (base^y)^z

S = (g^b mod N)^{(a + ux)} mod N <— desired end value

Server –> Client

Server: I see your login request. Here is a salt and a public key B.

Client

The client gets B and salt from the server. It also knows k, N, g, and password. It obviously knows its own private key a. And, because it has both A and B, it can build u the same way the server did. Also, from salt and the password, it can rebuild v through intermediate x.

u = SHA256(A|B)

x = SHA256(salt | password)

v = g^x mod N

B = kv + g^b mod N

With these values, we can compute a value S that is mathematically equal to the definition of S stated above as well as the value of S computed by the server.

S = [B – k(g^x mod N)]^{(a + ux)} mod N

substitute B for its value

S = [kv + g^b mod N – k(g^x mod N)]^{(a + ux)} mod N

substitute v for its value

S = [(k(g^x mod N) + g^b mod N – k(g^x mod N)]^{(a + ux)} mod N

simplify … k(g^x mod N) cancels itself out

S = (g^b mod N)^{(a + ux)} mod N <— desired end value

Final authentication

Once the client and server have independently found the shared secret S, the client produces a hash of S and sends it to the server. The server independently produces a hash of S and compares. If they match, the user is authenticated.

You can see my implementation of the client and server on my github repo. I had fun implementing this.

Fatal flaw?

Look at the math again. Consider the question posed in challenge 37: what if the client sends a value of 0 for A? What does that do to the shared secret value S computed by the server?

S = [A(v^u mod N)]^b mod N
  = [0(v^u mod N)]^b mod N
  = 0^b mod N
S = 0

If a malicious client sends 0 for A, then S = 0, which means that the SHA256 value sent to the server for authentication just becomes a constant: SHA256(0), which enables anyone to log in without the password. If A is any multiple of N, then S = 0 because any multiple of a number N modulo N will equal 0.

Oops!

Timing leaks and multi-threading

Landon — Tue, 24 Aug 2021 13:48:55 +0000

What if the server that verified MACs took longer to verify a correct mac than an incorrect one? Or, perhaps put differently, what if you could tell the difference between a more correct guess than an obviously wrong one? If you can, you can break MAC authentication schemes, and that’s what the cryptopals authors are trying to get at in challenges 31 and 32.

Write a function, call it "insecure_compare", that implements the == operation by doing byte-at-a-time comparisons with early exit (ie, return false at the first non-matching byte).

In the loop for "insecure_compare", add a 50ms sleep (sleep 50ms after each byte).

Use your "insecure_compare" function to verify the HMACs on incoming requests, and test that the whole contraption works. Return a 500 if the MAC is invalid, and a 200 if it's OK.

This is simple enough. Remember that a hash value is simply a byte array 20 bytes long in hexadecimal string form. If you take a byte array of 20 bytes and start changing then start changing one byte at a time, you can pretty easily determine when a byte in your candidate is correct. How? Brute force each byte in the hash.

Stand up a vulnerable server. I used spring boot and jetty. You can see my vulnerable server code here.
Starting with a byte array of all zeros, make 256 requests against the server, rotating byte n=0 (the first byte) through all 256 possible values. Measure how long each request takes. The correct byte in position n=0 is the one that took the longest.
Save the byte in position n=0
Repeat for positions n=(1, 19) until you finally get a 200 out of the server.

Challenge 31 was simple because of how obvious the timing leak is at 50 ms. It got much more difficult with challenge 32 because of the statistical noise involved with sending requests with a 5 ms delay instead of the 50 ms delay. Any number of things on the server can cause delays of a millisecond or two, especially when you’re running a small web server on a not-powerful system. Every solution for this challenge I could find elsewhere involved either not setting up an explicit web server like I had done and just relied on class-to-class method calls (who could blame them?) and usually relied on simply averaging the lengths of the request and taking the group of requests that took the longest.

Ultimately in my solution, I did rely on averaging, but I also relied on a process of elimination where I limited the number of requests I made for all possible solutions and then make 3x more requests for the five candidates that took the longest on the first go round. Both for 31 and 32, I used multi-threading and used the same tool to crack a timing leak at 50 ms and a timing leak at 5 ms. For the 50ms timing leak, I used upwards of 30 threads. The noise that produced in response times was not enough to pollute the results. For the 5ms leak, I could only use two or three and successfully break the mac. Otherwise, the noise was too much. You can see my solution below.

Final note: I used multi-threading to try to speed up build times with mixed results. This crack takes a long time to run. After you find the first byte correctly, each successive request is also going to be delayed. This means finding the first byte takes a fraction of a second. Finding the second byte with a single thread takes (50ms * 256) + 50ms at minimum, which is roughly 13 seconds. The third byte takes (100ms * 256) + 50ms, which is roughly 26 seconds. This grows in linear fashion. If you can multi-thread, you can make concurrent requests to arrive at the solution more quickly. Even so, getting reliable results takes time. With my solution, each challenge takes 20 minutes or so to compute. Rather than sit around for 3/4 of an hour waiting for the crack to finish, I tagged these tests and ignore them by default. I only run to them when I push to a specific branch on a GitHub server. GitHub will tell me later if there’s a problem.

Here’s my timing leak exploiter:

/**
 * a class dedicated to exploiting timing leaks in order to complete
 * challenges 31 and 32
 */
@Slf4j
public class C31_32_TimingLeakExploiter {

    private final String file;
    private final int port;
    private final RestTemplate restTemplate;
    private final Executor ex;

    public C31_32_TimingLeakExploiter(String file, int port, RestTemplate restTemplate, int numOfThreads) {
        this.file = file;
        this.port = port;
        this.restTemplate = restTemplate;
        this.ex = Executors.newFixedThreadPool(numOfThreads);
    }

    @SneakyThrows
    public void exploitLeak(final byte[] forgedHash) {
        for (int i = 0; i < forgedHash.length; i++) {
            log.info("starting round one with full byte set");
            //round 1
            Set byteSet = IntStream.range(Byte.MIN_VALUE, Byte.MAX_VALUE + 1).mapToObj(n -> (byte) n).collect(Collectors.toSet());
            final SortedSet> tree = gatherSortedData(forgedHash, i, byteSet, 5, 33);
            log.info("round one over. top five results: {}", tree);
            //round 2
            byteSet = tree.stream().map(Pair::getKey).collect(Collectors.toSet());
            final SortedSet> tree2 = gatherSortedData(forgedHash, i, byteSet, 1, 100);
            forgedHash[i] = tree2.first().getKey();
            log.info("found a byte ({}). now the hash is {}", tree2.first().getKey(), Hex.toHexString(forgedHash));

            if (HttpStatus.OK == makeRequest(forgedHash).get().getKey()) {
                log.info("the hash was {}", Hex.toHexString(forgedHash));
                break;
            }
        }
    }

    private SortedSet> gatherSortedData(final byte[] forgedHash, final int i, Collection initialCandidates,
                                                           int limitOfNewCandidates, int limitOfRequest) {
        final var futuresMap = new HashMap>>>();
        for (byte k : initialCandidates) {
            final List samples = new ArrayList<>();
            for (int j = 0; j < limitOfRequest; j++) {
                var clone = Arrays.clone(forgedHash);
                clone[i] = k;
                samples.add(clone);
            }
            final var futures = samples.stream().map(this::makeRequest).collect(Collectors.toList());
            futuresMap.put(k, futures);
        }

        SortedSet> candidates = new TreeSet<>(Comparator.comparing(Pair::getRight, Comparator.reverseOrder()));

        for (Map.Entry>>> entry : futuresMap.entrySet()) {
            var futures = entry.getValue();
            var all = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
            var totalTime = all.thenApply(v -> futures.stream().map(CompletableFuture::join).map(Pair::getRight).reduce(0L, Long::sum)).join();
            final double mean = totalTime.doubleValue() / (double) entry.getValue().size();
            if (candidates.size() < limitOfNewCandidates) {
                candidates.add(Pair.of(entry.getKey(), mean));
            } else if (candidates.stream().anyMatch(e -> e.getValue() < mean)) {
                candidates.add(Pair.of(entry.getKey(), mean));
                candidates.remove(candidates.last());
            }
        }
        return candidates;
    }

    public CompletableFuture> makeRequest(final byte[] forgedHash) {
        return CompletableFuture.supplyAsync(() -> {
            final long startTime = System.currentTimeMillis();
            final String signature = Hex.toHexString(forgedHash);
            final URI uri = URI.create(String.format("http://localhost:%s/leak/test/%s?signature=%s",
                    port,
                    file,
                    signature
            ));
            final ResponseEntity response = restTemplate.getForEntity(uri, String.class);
            final long responseTime = System.currentTimeMillis() - startTime;
            if (response.getStatusCode() == HttpStatus.BAD_REQUEST) {
                throw new AssertionError("Got a bad request response");
            }
            return Pair.of(response.getStatusCode(), responseTime);
        }, ex);
    }
}

SHA1 and MD4 Length Extension Attacks Explained

Landon — Tue, 13 Jul 2021 13:27:21 +0000

Continuing my series on the cryptopals challenges…

In section four, two of the challenges require you to get past a checksum test by spoofing a hash associated with a forged message. The idea is that if you can manage to pass a query string to an application (say a web application) that has been toyed with and provide a valid message hash with that manipulated query string, you can get other systems to do what you want. As an example, flickr was exposed as vulnerable to this attack in 2009.

Because of the algorithms that undergird them, SHA1, MD4, and (as I understand it), MD5 are susceptible to this type of attack. Even if the message is prepended by a secret key of unknown length before being hashed, it is possible to leverage common implementations of these algorithms to generate new hashes that will validate against manipulated strings. For this reason (and others), simple message authentication codes (MACs) produced by SHA1, MD4 and MD5 shouldn’t be trusted.

But how? To understand, you need to understand what happens in the underlying algorithm. In describing this, I’ll note that there are some meaningful differences between SHA1 and MD4, such as endianness, digest length, and the underlying processing method, but in the ways that matter for this attack, they’re the same. I’ll call the differences out if they’re important, but otherwise, just assume I’m speaking in terms of SHA1.

When a message is hashed, that message is converted mathematically into a code. With SHA1 and MD4 (the two algorithms in the challenge), that code is a hexadecimal character string 40 characters long, which is derived from a byte array 20 bytes long (or 16 bytes for MD4). In other words, the algorithm takes the message, uses it to manipulate a byte array, and at the end of all the math, that resulting byte array is the code. Importantly, this implies the algorithm is stateful.

If you look at the bouncycastle implementation of SHA1 (the one I used), you’ll notice five integer registers: H1, H2, H3, H4, H5. Each integer occupies 32 bits or 4 bytes, and since there are five registers, that makes for 20 bytes stored in the digest’s state. This state variable is where the ultimate hash comes from. As the digest mathematically processes the message, the end result of each processing step is to manipulate these registers. But the registers are only manipulated once for every 512-bit/64-byte block, which begs the question: What if your message doesn’t have a length of a multiple of 64 bytes?

The algorithm will pad the final block and fill it out so that there are 64 bytes in the block to process. It does this in two meaningful ways:

The first byte after the end of the message is always 10000000. I’ll call this the “bitflag.”
The last eight bytes contain a count of the bits, or bitcount, (not bytes) in the message (and if there isn’t a full eight bytes available to hold it, then a whole new, empty, 64-byte block is appended). For example:
- if the message length is 20 bits, then the last byte will be 00010100
- if the message contains 1000 bits, then the last two bytes will be 00000011 11101000
- if the message contains 100,000 bits, then the last three bytes will be 00000001 10000110 10100000
- you get it

After the bitflag and the bitcount are placed in the final 64-bit block of the message, the final block is processed just like any other block. The register state then becomes the digest that is output as the final result of the hash, at which point, the registers are reset.

I’ll say that again: the final output of the algorithm is the state of the registers after the final block is processed.

Ergo, if you can manipulate the registers, you can pretend that a bitflag and a bitcount were part of your message to begin with, and then feed the algorithm more message bytes (which, of course, you control) which will ultimately result in a new MAC that can authenticate against a manipulated message.

To do this, step one is to manually un-reset the registers back to their final state before the message hash was output. To be clear, you’re not just setting the registers back to what they were after the final byte of the message is processed. What you’re doing is resetting the registers to their final state after the bitflag and bitcount were put in their places. (Cryptopals calls the message contents from the bitflag to the bitcount “glue padding,” so I will too.)

Once the registers are reset back to their final state, you feed the digest more message and get a new hash. What’s really going to happen is the digest will take your new message, append new glue padding to round out a block, and will process that new block to produce your new hash.

So now you have a new hash for a manipulated message. But, if you can’t replicate the message the hash is tied to, then the hash isn’t worth much.

To do this, you have to recreate what has now become the original message: prefix + message + glue padding + appendix. You know the message and the appendix (and their lengths), obviously. Whether or not you know the prefix doesn’t matter as long as you know, or can at least guess, its length. If you know those lengths, you can rebuild the glue padding and fill out a block with that padding before tacking on the appendix.

Here’s how I did it. The end result is a byte array that starts with a single 1 and has the bitcount packed into the final eight bytes. The length of the result is enough to round out a 64-byte block (or to finish off one block and append a new 64-byte block if there isn’t enough space in the block to pack the bitcount into the final eight bytes).

    /**
     * given a message, build the MD padding as close to the same
     * way as possible
     *
     * the algorithm, best as i can tell, leads with a single bit and then over-writes
     * the last bytes of the 512-bit block with the number of BITS in the message
     *
     * @param message the message
     * @return the md padding
     */
    public byte[] buildGluePadding(final byte[] message) {
        //build a 512-bit block
        byte[] block = new byte[BLOCK_SIZE];

        //get the last 64 characters of the message
        //or if it IS 64, then an empty byte array
        final byte[] subMsg;
        if (message.length == BLOCK_SIZE) {
            subMsg = new byte[0];
        } else if (message.length > BLOCK_SIZE) {
            final int startPos = (message.length / 64) * 64;
            subMsg = new byte[message.length - startPos];
            System.arraycopy(message, startPos, subMsg, 0, message.length - startPos);
        } else {
            subMsg = message;
        }

        //copy the message bytes into the block
        System.arraycopy(subMsg, 0, block, 0, subMsg.length);
        int index = subMsg.length;

        //make the next byte the flag byte
        block[index++] = Byte.MIN_VALUE;

        //have to allow for the bit count to take up to 8 bytes
        if (index >= BLOCK_SIZE - BIT_COUNT_SPACE) {
            block = ByteArrayUtil.concatenate(block, new byte[BLOCK_SIZE]);
        }

        //get the number of bits in the message
        final long messageBitCount = (long) message.length << 3;
        packTheBitCount(block, messageBitCount);

        final int padLength = block.length - subMsg.length;
        final byte[] returnValue = new byte[padLength];
        System.arraycopy(block, subMsg.length, returnValue, 0, padLength);

        return returnValue;
    }

Note: This algorithm works for either SHA1 or MD4, but the method by which the bitcount is packed into the final block, packTheBitCount(), is different between the two. Here’s the implementation for SHA1, and here’s MD4. The biggest difference (which was frustrating), is that MD4 is little-endian, meaning least significant bit first. SHA1 is big-endian, meaning least significant bit last.

Once you’ve got the glue padding, you can try to authenticate. The prefix is going to be prepended before being hashed, so the message to be submitted is message + glue padding + appendix. Submit that with the hash. As your manipulated message is processed, block by block, it will eventually arrive at the block that contains your manipulated glue padding. If you do it right, the register state after that block is processed should be identical to the state you forced when you obtained your new hash.

From there, the algorithm behaves deterministically. It fills out the final block that contains ;admin=true with new glue padding, then it processes the final 64-byte block, producing a hash. If the register state going into the processing of that final block was the same as it was when you overrode the register state, then the final result will be the same, and your message will authenticate.

Breaking Counter Mode Encryption

Landon — Fri, 04 Jun 2021 21:29:34 +0000

The subject of today’s post is breaking counter mode encryption, which directly concerns three cryptopals challenges: challenge 19, challenge 20, and challenge 25. (And maybe more … I’m only as far as challenge 25 at this point.)

What is counter mode encryption? Counter mode encryption is a method of encryption in which the content of a message itself is subjected to an XOR binary mathematical operation against an encrypted stream of bytes. It’s called counter mode because the non-encrypted stream of bytes is simply a unique nonce paired up with a counter that is incremented, encrypted, and chained over and over again. Wikipedia has a great summary of what it is.

Look at the image at the top of this post. That’s the algorithm. Can you spot any weaknesses?

In the three previously mentioned challenges, the task is to decrypt a ciphertext without knowing the key. If you look carefully at that image, it should be relatively obvious that there are two weaknesses.

The block cipher is deterministic. If you can figure out what the nonce and counter is, you can produce the same keystream. That’s usually pretty hard if it’s done right, but it’s still a weakness.
If you can figure out what the keystream is (regardless of the nonce, counter and key), then you can figure out the plaintext. After all, the plaintext is just an XOR of the ciphertext and the keystream.

In challenges 19 and 20, multiple messages are built using the exact same keystream. This is a no-no. If you encrypt several messages with the same keystream, it becomes relatively simple to figure out the keystream. After all, for a single character in a message, there are only 256 possible bytes against which the message can be XOR’d to produce the cipher text. When you know that the same character different characters (say the first one) in several different messages were built via XOR against the same byte of keystream, then you can find the keystream by trying all 256 bytes and taking what you consider the “best one.”

How to find the best one? In challenge 19, they ask you to primarily use trial and error. Cycle through all 256 a few times for the first few columns, review the results, try some more bytes, and make more substitutions.

In challenge 20, they tell you to do this programmatically. Remember ETAOIN SHRDLU? Frequency analysis on each individual first, second, third, nth character in each of these messages is how I determined which was the best.

public abstract class AbstractFrequencyAnalyzingCTRKeyDeterminer {
    //without knowing the key, can we derive the keystream?
    // ciphertext block XOR keystream block = plaintext block
    // since we have the cipher text block, we just have to figure out what to xor against these texts to make them
    // legible

    final Chi chi = new Chi();
    final XOR xor = new XOR();

    public abstract void additionalManualTweaks(final byte[][] ciphertexts, final byte[] keyStream);

    public byte[] findTheKeyStream(final byte[][] ciphertexts) {
        //inefficient, but find the longest cipher length
        int maxLen = Arrays.stream(ciphertexts)
                .map(b -> b.length)
                .max(Integer::compareTo)
                .orElseThrow();

        byte[] keyStream = new byte[maxLen];

        //get a column of letters in cipher text
        // gracefully pass by ciphertexts without letters in the column under examination
        for (int l = 0; l < keyStream.length; l++) {
            final byte[] temp = new byte[ciphertexts.length];

            int count = 0;
            for (byte[] ciphertext : ciphertexts) {
                if (l < ciphertext.length) {
                    temp[count] = ciphertext[l];
                    count++;
                }
            }
            final byte[] byteColumn = new byte[count];
            System.arraycopy(temp, 0, byteColumn, 0, count);
            keyStream[l] = determineKeyByte(byteColumn);
        }

        //this function will manually place characters to resolve the full keystream
        additionalManualTweaks(ciphertexts, keyStream);

        return keyStream;
    }
   
    /**
    *  use chi-squared scores to find the "best" byte for the key
    */
    private byte determineKeyByte(final byte[] byteColumn) {
        //get the most likely first byte, but print them all

        double lowestChiScore = Double.MAX_VALUE;
        Integer winner = null;
        for (int i = Byte.MIN_VALUE; i <= Byte.MAX_VALUE; i++) {
            char[] xordFirstLetters = xor.singleKeyXOR(byteColumn, i);
            double localChi = chi.score(xordFirstLetters);
            if (localChi < lowestChiScore) {
                lowestChiScore = localChi;
                winner = i;
            }
        }

        assert winner != null;
        return (byte) winner.intValue();
    }
}

In challenge 25, they give you a new tool. They say, “Hey! You can now edit the original message via an edit function!” They tell you to build the keystream, encrypt the new text against the appropriate keystream bytes, and overwrite the original message. Then, if you decrypt the message, you’ll find your text beginning at the appropriate offset.

    public void edit(final byte[] cipherText, final int offset, final String newText) {
        // get keystream of length offset plus newText.length rounded up to block size
        final LittleEndianNonce nonce = new LittleEndianNonce();
        final int blockLength = nonce.get().length;
        final int newLength = ((newText.length() + offset) / blockLength) * blockLength + blockLength;
        final int numOfBlocks = newLength / blockLength;
        final byte[] keystream = new byte[newLength];
        for (int block = 0; block < numOfBlocks; block++) {
            var encryptedNonce = ecb.AES(nonce.get(), CipherMode.ENCRYPT);
            System.arraycopy(encryptedNonce, 0, keystream, block * encryptedNonce.length, encryptedNonce.length);
            nonce.increment();
        }

        //get the portion of the keystream we actually care about
        final byte[] ktext = new byte[newText.length()];
        System.arraycopy(keystream, offset, ktext, 0, newText.length());

        // overwrite the ciphertext
        var newTextBytes = newText.getBytes();
        var sub = xor.multiByteXOR(newTextBytes, ktext);
        System.arraycopy(sub, 0, cipherText, offset, sub.length);
    }

But this is a fatal flaw. XOR has a property that dooms whoever thinks this might have been a good idea: XOR-ing a text against all zeros (and I mean binary zero –> [00000000], not character zero, which is actually 48 in binary –> [00110000]) leaves the original text. So, if you edit a ciphertext such that the entire ciphertext is overwritten with zeros, you’ll get the unaltered keystream.

    /**
     * plaintext can be anything. Find an ascii art and use that if you want.
     */
    @Test
    void recoverThePlainText() {
        final String plaintext = getPlainTextFromFile();
        final byte[] cipherText = ctr.encrypt(plaintext);
        /*------------------------------------------*/

        //we can get the keystream out of the edit function by
        // passing in all 0s
        final String breakerString = new String(new byte[cipherText.length]);

        //copy the cipherText so we retain original
        final byte[] keystream = ArrayUtils.clone(cipherText);

        //edit the copy with the breaker string to get the keystream
        ctr.edit(keystream, 0, breakerString);

        //once we have the keystream, we can simply xor it against the cipherText to recover the plaintext
        final String broken = new String(xor.multiByteXOR(cipherText, keystream));

        assertEquals(plaintext, broken);
    }

Game over.

Of these three challenges, I probably enjoyed challenge 20 the most. It took me a bit to figure out that I could treat the set of messages they provided as a matrix, where every column in the matrix was XOR’d against the same character. Once I figured that out, it just became a single character XOR and frequency analysis. Challenge 25 was way too obvious, and I didn’t have the patience to sit there poking and plugging at challenge 19. I solved challenge 20 first, and then applied my solution to challenge 19.

Thanks for reading! -LH

Cloning a Mersenne Twister Random Number Generator from its output

Landon — Wed, 26 May 2021 17:38:13 +0000

As was said in my last post, I’m doing cryptopals. Just last night I finished Challenge 23. I was able to successfully clone a 32-bit Mersenne Twister pseudorandom number generator (PRNG) from its output. You can see how I did this by checking out my solution in my github repo.

If you’re like me when first looking at this, you probably are overwhelmed. So, I thought I would take a moment to try to explain how and why this works. I can’t do this, though, without acknowledging this blog post and its author. It was incredibly helpful in understanding how to reverse the tempering function of the mersenne twister.

Also, I want to address the question posed in the original problem:

How would you modify MT19937 to make this attack hard? What would happen if you subjected each tempered output to a cryptographic hash?

Let’s get going…

Preface

What’s interesting about the mersenne twister is that the PRNG retains an array within that manages the generator’s state. It has to do this in order to ensure that it doesn’t start repeating numbers. The period of a PRNG is how long it takes before the generator starts repeating numbers, and the primary perk to this particular PRNG is that it has a super long period: 2^19937-1 . (That’s a huge number.)

When the PRNG gives you a “random number,” it pulls a number out of its state array and subjects that number to “tempering.” What that means is that the number is subjected to several bitwise mathematical operations, and is then returned. Here’s what that looks like in the version of the twister that I implemented. (The capital letters are constants … Pay attention to the two private functions.)

    /**
     * temper a value
     * @param untempered the value to be tempered
     * @return the tempered value
     */
    int temper(int untempered) {
        int y = temperRightShift(untempered, U, D);
        y = temperLeftShift(y, S, B);
        y = temperLeftShift(y, T, C);
        y = temperRightShift(y, L, FULL_MASK);
        return y;
    }

    private int temperRightShift(final int y, final int shift, final int mask) {
        return y ^ ((y >>> shift) & mask);
    }

    private int temperLeftShift(final int y, final int shift, final int mask) {
        return y ^ ((y << shift) & mask);
    }

There are two “versions,” so to speak, of transformation: a left shift operation and a right shift operation. In both cases, the original value is subjected to a bit shift, then and’d against a mask, and then xor’d against the original.

What’s not obvious

If you haven’t yet, you really should go check out this article. If you have, I’m going to try to do this in a way that is perhaps a bit clearer than he put it. I’m going to use the number -252645136, which when written out in binary, looks like this.

1111 0000 1111 0000 1111 0000 1111 0000

Nice and easy to see what’s going on when the number pattern is super obvious. Let’s perform and then undo the third transformation using this number: y := y xor ((y << s) and b), where y is our number, s is the shifting constant 7, and b is the bitmask 0x9D2C5680.

Let’s start with the first operation: Shift our number 7 digits to the left and AND it against b. Let’s call the result y’. An asterisk is an original part of the number, and a – is something that came in as a result of a shift.

                                
  **** **** **** **** **** **** *--- ----     
  0111 1000 0111 1000 0111 1000 0000 0000  <-- 7-bit shift left
& 1001 1101 0010 1100 0101 0110 1000 0000  <-- 0x9D2C5680
-----------------------------------------
  0001 1000 0010 1000 0101 0000 0000 0000  <-- y'

Gonna pause there before i do the next operation. Look at the last 7 bits of y’. It’s all 0s. If you’ve ever done XOR before, you know that XOR-ing something against 0 will leave you with what you started with. It’s the same as AND-ing something against a 1. No change. And, the reason that those figures are 0 is because of the shift. (If the original number ended in all 1’s, the last 7 bits of the shifted number would still all be 0. For this reason, I said 7 bits and not 12.)

This means a piece of the original number is going to remain in our final result. That is important, but it’s not super obvious!

Let’s do the next operation: XOR the result against the unmodified original. Let’s call the result z, like bloglien does.

  1111 0000 1111 0000 1111 0000 1111 0000  <-- y
X 0001 1000 0010 1000 0101 0000 0000 0000  <-- y' 
-----------------------------------------
  1110 1000 1101 1000 1010 0000 1111 0000  <-- z

Now, to undo this. If you didn’t know this, I should say that XOR is reversible. q XOR r = t and t XOR r = q. This means that all we have to do recover the original is recover y’.

But how can we do that? y’ is gone once the operation is done. Is it recoverable?

Yes. yes it is, but only because you retain a piece of the original y in z. That’s what’s not obvious in this at first glance.

Recovering y’ and y

In order to recover the original y, y’ is needed. How does this work? Bloglien calls it “waterfalling.” I call it “redoing the original operation 7 bits at a time.” As I do this, I’m going to leave out what I call “extraneous bits,” (which after masking is always 0). If they’re not of immediate concern, I’m just not going to write them. I will, however, keep them vertically aligned with the other figures. If you want to pull a piece of paper out and fill out every bit, feel free.

Step one: Mask the portion of z that contains the unaltered portion of y.

  1110 1000 1101 1000 1010 0000 1111 0000 <-- z
& 0000 0000 0000 0000 0000 0000 0111 1111 <-- mask the lowest 7 bits
-----------------------------------------
                                 111 0000 <-- what we are left with

Step two: Just like we did to the original y, shift 7 bits to the left and AND that against b (the masking constant), and poof! We have 7 bits of y’.

                        11 1000 0         <-- lowest 7 bits shifted left 7 bits
                      & 01 0110 1         <-- 7 bits of b in these positions
                      -----------
                        01 0000 0         <-- 7 bits of recovered y'

Step three: XOR the recovered y’ against z to recover 7 more bits of y.

                        10 0000 1         <-- 7 bits of z
                    XOR 01 0000 0         <-- 7 bits of recovered y'
                    -------------
                        11 0000 1         <-- 7 more bits of y

Step four: REPEAT! You have 7 more bits of y. Mask them, shift them, AND against b to recover 7 more bits of y’, and then XOR against z to recover 7 more bits of y.

               1 1000 01                  <-- masked and shifted
             & 0 1100 01                  <-- b
            ------------
               0 1000 01                  <-- recovered y'
           XOR 1 1000 10                  <-- z
           -------------
               1 0000 11                  <-- y

       1000 011                           <-- y, masked and shifted
     & 1101 001                           <-- b
     ----------
       1000 001                           <-- recovered y'
   XOR 1000 110                           <-- z
   ------------
       0000 111                           <-- recovered y

  0111                                    <-- y, masked and shifted (we lose 3 bits)
& 1001                                    <-- b
------ 
  0001                                    <-- recovered y'
X 1110                                    <-- z
------ 
  1111                                    <-- recovered y

Now, take all the bits of recovered y and concatenate them, and you get:

 1111 0000 1111 0000 1111 0000 1111 0000  <-- the original y

That’s the algorithm. Mask, shift, AND, XOR, Repeat. I could redo this for a right shift to show you how it works in the opposite direction, but it’s the same except that you retain the most significant bits of a number instead of the least significant bits. Also, the right shift is less difficult than the left because the shifting constants are larger, meaning you run out of room more quickly. (With the fourth transformation, you don’t even need to repeat. It’s one and done.)

In code

So how does “untempering” look when you put it in code? Well, pretty close to tempering, but with a few gotchas.

The first line in each unTemper function calls on a method to convert an integer into a bitmask. In other words, 7 becomes 0000 0000 0000 0000 0000 0000 0111 1111 in convertIntToRightEndMask. The definition for this can be found here.

From there, the mask is moved horizontally by shift bits, starting with 0 (since blockMaskShift starts at 0). Then we AND it against z, which allows us to get the original bits that stayed from the previous transformation all by themselves. Then we shift those bits, mask them against the constant, and XOR it against z, and repeat. It works for left or right shifts the exact same. Plug in the right inputs (which come from the MT spec), and you can undo tempering.

    int unTemper(final int tempered) {
        int y = unTemperRightShift(tempered, L, FULL_MASK);
        y = unTemperLeftShift(y, T, C);
        y = unTemperLeftShift(y, S, B);
        y = unTemperRightShift(y, U, D);
        return y;
    }

    private int unTemperRightShift(final int z, final int shift, final int maskConst) {
        final int blockMask = convertIntToLeftEndMask(shift);
        int zp = z;
        for (int blockMaskShift = 0; blockMaskShift < (W - shift); blockMaskShift += shift) {
            zp = zp ^ (((zp & (blockMask >>> blockMaskShift)) >>> shift) & maskConst);
        }
        return zp;
    }

    private int unTemperLeftShift(final int z, final int shift, final int maskConst) {
        final int blockMask = convertIntToRightEndMask(shift);
        int zp = z;
        for (int blockMaskShift = 0; blockMaskShift < (W - shift); blockMaskShift += shift) {
            zp = zp ^ (((zp & (blockMask << blockMaskShift)) << shift) & maskConst);
        }
        return zp;
    }

Clone the twister

Once you’ve got past the hard part of undoing the twister, all you need is a sample of 624 outputs from a Mersenne Twister PRNG to clone it. You just “untemper” those outputs, overwrite the PRNG’s internal state array with the untempered values, and boom. You’re done. See here.

Addressing their question

So, to go back to the authors’ original question about this…

How would you modify MT19937 to make this attack hard? What would happen if you subjected each tempered output to a cryptographic hash?

Good question.

If you subject the state value to a cryptographic hash at any point during tempering, you wipe out the original bits tied to the state value, which makes it super difficult to recover inasmuch as the cryptographic hash is super difficult to crack (which they generally are). At that point, you’d basically be left with brute force, trying at most 2^32-1 different permutations to see if an untempered value matches the cryptographic hash. This would effectively make the PRNG far more secure since you wouldn’t be able to easily undo it.

In sum, if you get rid of the original bits from the untempered inputs, then you make it super hard to recover the untempered value tied to the PRNG’s state array.

This is cool

At first, this sort of blew my mind, but as I worked through it a few times in code and on paper, it ended up being one of the most fun challenges to complete in cryptopals. That’s the reason behind this lengthy write-up: I really got excited about this problem. I hope that others are just as fun to crack in the future.

Peace! -LH