I’m doing cryptopals

by Landon | May 21, 2021 | Leave a comment

Cryptography fascinates me. It’s amazing how critical cryptography is to the internet and the digital economy. Even more amazing to me is how simple it is to crack if it’s insecure.

I don’t have a computer science degree; I took some courses on algorithmic design in college, but felt so totally lost and overwhelmed that I changed my major. Several years later, I’m in the mood to learn some more.

In an effort to better understand cryptography in general, and as a personal project, I started doing the cryptopals challenges. I took a brief hiatus during the summer and fall due to a remodel and a move, but now that those things are done, I’m back at it.

To date, I’ve made it through roughly two and a half of their eight sets. They’re not super easy and take some time to crack. You can check my progress at my github repo.

I don’t think it’s worthwhile to share the detailed solutions to each problem. That’s all over the internet. Mostly, I want to just fill in the gaps where the cryptopals writers leave them… because a lot of things are not at all clear as you read the problem. (The most recent example that comes to mind is the sideways mention of little endian nonces in challenge 18.) I also will probably share code snippets from time to time. Who knows.

To kick it off, I want to draw attention to challenge 3…

Single-byte XOR cipher

The hex encoded string...

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

... has been XOR'd against a single character. Find the key, decrypt the message.

You can do this by hand. But don't: write code to do it for you.

How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

Achievement Unlocked
You now have our permission to make "ETAOIN SHRDLU" jokes on Twitter.

ETAOIN SHRDLU is an overt reference to frequency analysis. If you’ve never heard of that, it’s the analysis of frequency of letters in the English language. Most know E is the most frequent letter in the English language. Few know anything beyond that. A little googling here and there can tell you what the rest of the letters are in order of frequency.

Knowing frequency is one thing. Applying a test to see how well a candidate decrypted text fits up against expected letter frequency is totally another. Fortunately, there is a statistical test that can be applied: X2 (chi-squared) goodness of fit tests. I implemented such a test in this class.

Understanding chi-squared tests, and having a good implementation of such a test, comes in handy in several challenges following this one. Challenges 4, 6 and 20 depend on it to validate a candidate decryption. So take your time on this one. While writing this, I actually improved my implementation, so that’s good.

You can see what I did for my solution here.

Peace! -LH