RSA for those who aren’t number theorists
I just finished cryptopals challenge 39, in which I had to implement RSA.
For me, it wasn’t enough for me to just implement the RSA algorithm. I sort of needed to understand a bit about the underlying number theory. I say that because I’ve faced instances in the past where a typo or error in a cryptopals challenge description threw me off the trail for days or weeks. So, rather than bang my head against a wall, I took as deep a dive as I needed to understand enough about the math behind this algorithm to make sense of it all.
I still don’t know if I’m all the way there. I am not a math novice, but I’m definitely no number theorist. In any case, here’s a best attempt to try to explain this in a way that will make sense to those of us who are also not mathematicians but have need to learn about how RSA is supposed to work mathematically.
TL;DR
Here are the steps to generate an RSA key.
- Select two prime numbers p and q. Multiply them to get n.
- Using modulo of n, Euler’s theorem tells us akφ(n)+1 ≡ a mod n if a and n are coprime. (To make sure of this, we use very large prime numbers p and q to determine n.)
- We can substitute kφ(n)+1 with ed provided that we select an e that is coprime with φ(n).
- φ(n) = (p-1)(q-1)
- In order for e to be coprime with (p – 1)(q – 1), we select an e that is prime and greater than or equal to both p and q.
- Alternatively, we check to ensure prime number e is not a factor of φ(n) and keep picking different primes for e until we find one that isn’t a factor of φ(n). This will make e coprime to φ(n).
- Once we’ve selected an e, we know that a modular multiplicative inverse must exist because e and φ(n) are coprime; we find the inverse d using the extended euclidean algorithm.
- The lock is [e, n] and the key is [d, n].
- Congratulate yourself!
What in the….?
We are going to go through the theory behind these steps in more detail. We’ll start with what invertibility means, which will lead us into the notion of modular multiplicative inverses. Then we’ll discuss φ(n), called simply the “phi function,” look at phi functions of prime numbers and products of prime numbers, and we’ll finish with Euler’s theorem. Then, with that foundation, we’ll select a key pair that meets all the criteria that will allow RSA to work.
There are links spread throughout this post, but as a favor, here are some of the sources that helped me understand this. A lot of them have more number theory than I think the average joe needs to understand to make this work. If these don’t help well enough, feel free to use a search engine. The info about how this works is everywhere. Keep grinding, and it will click eventually.
Prime Numbers and RSA by Computerphile
RSA by Eddie Woo Part One and Part Two
Encryption and Huge Numbers by Numberphile
Jeremy Kun Math x Programming June 2011
Invertibility
First things first: In order for an encryption scheme to be worth anything, it has to be invertible (or, someone somewhere has to be able to decrypt an encrypted message). If you can’t decrypt an encrypted message, what’s it worth? Nothing.
RSA is inverted using modular exponentiation. That is, exponentiate a figure, divide it by another figure, and find the remainder of that division to encrypt. Repeat using different figures to arrive back at the beginning.
(Note: You’ll notice in the equations that follow, we don’t speak in terms of equality. We speak in terms of congruence. What’s congruence? Read the section on modular arithmetic here.)
Encryption: Me mod n ≡ C
Decryption: Cd mod n ≡ M
If you please, we can make a substitution for C and get the fundamental expression that undergirds RSA.
Med mod n ≡ M
Or, as you will see just about anywhere if you search around…
Med ≡ M mod n ⟵ The golden formula. If you get lost, remember that this is ultimately what we’re after.
Modular multiplicative inverse
Notice that because we’re talking congruence, we can replace ed with 1 and it’s still true.
M1 ≡ M mod n
There’s a special term for a pair of integers that when multiplied together mod n is congruent to 1: Modular multiplicative inverse. Or, we are after a d that is the modular multiplicative inverse of e and vice-versa.
There’s a rule about modular multiplicative inverses. For any two figures a and n, a will have a modular multiplicative inverse b if an only if a and n are coprime.
We must define coprime: A number a is coprime to another number n if they have no common factors.
The numbers don’t have to be prime themselves. But if they were, they would by definition be coprime since they would have no common factors.
All this to say that if we’re going to find any pair e and d that will satisfy our golden formula, it’s going to involve some sort of modular multiplicative inverse. Once we have found a pair of figures that satisfy our golden formula, then we have found an asymmetric encryption key. The lock is [e, n] (power of e, mod of n), and the key is [d, n] (power of d, mod of n).
Now, the question is this: how do we find e and d? We’re not there yet. Hang in there.
Phi function
I have to lay another piece of ground work. We need to talk about the phi function, or φ(n).
φ(n) is a special number. φ(n) is the quantity of numbers that are coprime with n, meaning how many numbers greater than or equal to one that share no common factors with n.
As it happens, calculating φ(n) is trivial for prime numbers. Think about it. If a number is prime, then by definition, it has no factors other than 1 and itself. One is a factor of everything, so we can ignore it. Therefore, for a prime number p…
φ(p) = p – 1
Phi function of products of primes
Next step: what if we have two prime numbers p and q and want to figure out what φ(pq) is? (I know this seems random, but stay with me. It’s important.)
With this one, multiply the phi-function of each prime together to find the phi of their product. The proof is a bit over my head. If you want to read the proof, go here.
φ(pq) = φ(p)φ(q) = (p – 1)(q – 1)
For RSA, pq = n, or in other words, n is the product of two primes p and q, and φ(n) is the product of the phi function of each of its prime factors.
Let’s work a simple example with two primes: 3 and 5. Here are all the numbers from 1 to 15 with multiples of 3 and 5 bolded. Because the bolded figures are multiples of our prime numbers, they are not coprime with 15.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Count the number of non-bolded figures. There are 8: {1, 2, 4, 7, 8, 11, 13, 14}.
φ(15) = φ(3)φ(5) = 2 * 4 = 8
Seems good to me.
Euler’s theorem
Last sidetrack before we find e and d, I promise.
Once upon a time a guy named Euler came up with a theorem that says that if you have two numbers a and n that are coprime, then the following is true. If you want to read up on why or how you can go here or here.
aφ(n) ≡ 1 mod n
This can be toyed with. Let raise both sides to power k and simplify.
akφ(n) ≡ 1k mod n ⟶ akφ(n) ≡ 1 mod n
Further, let’s multiply both sides by a and simplify.
a * akφ(n) ≡ a * 1 mod n ⟶ a * akφ(n) ≡ a mod n ⟶ akφ(n) + 1 ≡ a mod n
Let’s call that out again, because it’s important
akφ(n) + 1 ≡ a mod n
Does this look at least a little familiar? Golden formula anyone?
Med ≡ M mod n
Therefore, if we can find two figures e and d such that the following is true, we have found our e and d.
ed = kφ(n) + 1
Choosing e… finally
Let’s make an assumption that there is a pair of figures e and d that satisfy this congruence. (Aside: recall from above that for this to be true, e and φ(n) have to be coprime.)
ed ≡ 1 mod φ(n)
Subtract one from both sides.
ed – 1 ≡ 0 mod φ(n)
If that’s true, then φ(n) divides evenly into ed – 1. Or, in other words, ed – 1 will be equal to some multiple of φ(n). We can call that multiplier k.
ed – 1 ≡ 0 mod φ(n) ⟶ ed – 1 = kφ(n)
At this point, add one to both sides again.
ed = kφ(n) + 1 ⟶ This is the equation we are after!
Therefore, given a certain condition, we can straight substitute ed into Euler’s Theorem, and this gives us our golden formula.
What are those conditions? As stated before, e and φ(n) have to be coprime.
How can we guarantee that? φ(n) = (p – 1)(q – 1). Recall that p and q are prime numbers (ideally, really large prime numbers). Choose a 3rd prime for e that is greater than (p – 1) and (q – 1), and you’ll be guaranteed to have it be coprime with φ(n), because the only way any prime number figure would be a factor of φ(n) is if it divides evenly into φ(n). If that prime number is greater than the figures we multiplied to get φ(n), then it will definitely be coprime to φ(n).
Alternatively, we can just pick random primes, small or large, and see if a chosen prime is a factor of φ(n). If it is, we have to pick again. We keep picking until we find one that is coprime to φ(n).
So to sum up, as long as e and φ(n) are coprime (no common factors), then there will absolutely be a d that makes the equation ed = kφ(n) + 1 true for some value k. The bounds for e are that it has to be greater than one and less than (p-1)(q-1), with the caveat that it cannot be a factor of (p-1)(q-1). The easiest way to ensure that is to pick a prime number that is greater than p and q.
Solving for d
Now that we have an e, we have to find d, which is the modular multiplicative inverse of e mod φ(n). Once we have e, d, and n, we are finished!
There’s a defined algorithm for this: The extended euclidean algorithm. I’m not going to explain it here because, honestly, it’s over my head. There is psuedocode on wikipedia that makes it pretty simple and straightforward to implement.
Interestingly, java comes with an implementation of this algorithm already attached to BigInteger
. It’s the modInverse
function. So, we already have what we need in order to determine the inverse of e mod φ(n). This makes any homebrew implementation pretty irrelevant in java, but the challenge invites us to implement it anyway. Here’s my implementation.
/**
* find the modular inverse of a mod n, which we call t
* an implementation of the extended euclidean algorithm
* sourced from <a href="https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Modular_integers" target="_blank">wikipedia</a>
*
* this is functionally equivalent to {@link BigInteger#modInverse(BigInteger)}
*
* @param a the number
* @param n the modulus
* @return the result, which I am calling t
*/
public BigInteger invMod(final BigInteger a, final BigInteger n) {
BigInteger t = BigInteger.ZERO;
BigInteger nextT = BigInteger.ONE;
BigInteger r = n;
BigInteger nextR = a;
while (nextR.compareTo(BigInteger.ZERO) != 0) {
var q = r.divide(nextR);
var tempT = t;
t = nextT;
nextT = tempT.subtract(q.multiply(nextT));
var tempR = r;
r = nextR;
nextR = tempR.subtract(q.multiply(nextR));
}
if (r.compareTo(BigInteger.ONE) > 0) {
throw new ArithmeticException("a is not invertible");
}
if (t.compareTo(BigInteger.ZERO) < 0) {
t = t.add(n);
}
return t;
}
Generating a key
Remember, the public key (or lock) is [e, n] and the private key (or key) is [d, n]. Let’s do a quick recap on the steps it takes to find this key.
- Select two prime numbers p and q. Multiply them to get n.
- Using modulo of n, Euler’s theorem tells us akφ(n)+1 ≡ a mod n if a and n are coprime. (To make sure of this, we use very large prime numbers p and q to determine n.)
- We can substitute kφ(n)+1 with ed provided that we select an e that is coprime with φ(n).
- φ(n) = (p-1)(q-1)
- In order for e to be coprime with (p – 1)(q – 1), we select an e that is prime and greater than or equal to both p and q.
- Alternatively, we check to ensure prime number e is not a factor of φ(n) and keep picking different primes for e until we find one that isn’t a factor of φ(n). This will make e coprime to φ(n).
- Once we’ve selected an e, we know that a modular multiplicative inverse must exist because e and φ(n) are coprime; we find the inverse d using the extended euclidean algorithm.
- The lock is [e, n] and the key is [d, n].
- Congratulate yourself!
Hopefully it makes more sense now!
Thanks for reading! -LH
I'm not sure I could explain it to anyone else, but it makes sense in my head for at least the moment. :)