Development of Secure Software Systems

CSci 4271 Lab 9

In this week's lab you're going to do a little bit of cryptanalysis, understanding the weakness of a MAC for short strings constructed from a hash function. Specifically this will be a chosen-plaintext attack: by computing the MAC of some specially chosen values, you'll be able to recover information that lets you forge MACs on any message you want. We'll simulate this black-box access to an instance of the MAC by giving you a program for computing the MAC that you can run.

As usual in the online lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done. We also recommend working in groups in the in-person lab, but there you can choose your own groups and physically raise your hand to ask a question. You may still find it useful to use Zoom or tmate for screen sharing in person while respecting social distancing.

The vulnerable MAC in this lab will produce 288-bit tags, and it also uses a 288-bit key, and internally it is implemented in terms of a cryptographic hash function with a 288-bit output. To make these 288-bit values easier to work with, instead of representing them as 36 bytes of binary data, we will instead represent them as 48-character strings in a format called base64, which as the name suggests represents 6 bits of binary data per ASCII character using upper and lowercase letters, digits, and the symbols slash and plus.

You may recall that the standard way to create a MAC out of a cryptographic hash function is a construction called HMAC. It's already a bit of a bad sign that our MAC is not using that construction. Internally the MAC truncates the output of a larger-output hash function, but that's not necessarily a bad idea. What you will see is a bad idea is that the MAC construction also truncates the input to the hash function. In context, a MAC designed for small strings might have some uses; for instance you could use the MAC of a username in a website login cookie. But the particular way this construction uses truncation makes a cryptanalytic attack possible that you'll try out.

You can get an instance of the MAC by running the script:

perl /web/classes/Fall-2020/csci4271/labs/09/make-mac.pl

on a CSE Labs Ubuntu 20.04 machine. This script will create (or overwrite if it already exists) a program named mac in you current directory, which computes MAC tags using a fixed internal key value. The string to compute the MAC over is supplied either as a single command-line input, or as the contents of the standard input. If you try it out on a few different inputs, you should see that when you give exactly the same input, you get the same MAC output, but different inputs give different outputs that all superficially look random. For instance try:

./mac a
./mac b
./mac 'a b'
./mac 'a  b'
echo -n 'a' | ./mac

Notice that whitespace is significant; the -n option to echo tells it no to add a newline character at the end.

Initial reverse engineering

Your sources have already given you some general information about how the MAC is implemented. We'll denote its internal hashing operation as a function H. H is implemented by running SHA-512 on the same input, and then selecting the first 288 bits of the SHA-512 output and encoding them as 48 characters in base64. You also know that the hash function is used a total of 3 times. First the hash function is applied to the key; then this hash is combined with the message and truncated to a fixed length if it is too long; then this truncated string is hashed, and the hash output is hashed again. The combination of the message and the hashed key is done by concatenation, and the truncation starts from the side of the concatenation with the message, so bytes from the key are dropped before bytes from the message. To explain it in equations, suppose that M represents the string that we're computing the MAC over and K is the key. Then the structure of the MAC is either:

H(H(prefix(M || H(K))))

H(H(suffix(H(K) || M)))

Where the prefix function keeps just the first n bytes of a string if it's too long, while suffix keeps just the last n bytes. But to start out we don't know the value of n.

For your first step in reverse engineering the MAC, figure out the maximum length n of bytes from the message and hashed key that are retained in the MAC computation. You can do this basically by just computing MACs of larger and larger inputs; at some point, making the input larger will stop changing the MAC.

Recovering one key-hash byte

The first part of the attack is seeing how you can figure out one byte out of the key hash. The basic idea is to create a string of length n - 1; suppose its MAC is t. Then consider all of the n-character strings obtained by adding one base64 character at either the beginning or the end, and their MACs. If you've figured out the value if n correctly, one of the hashes of n-character strings should match with t. What does that match tell you about the structure of the MAC and the contents of H(K)?

To get you started, we've implemented some of the infrastructure for this attack in a Perl script. But you'll need to modify it slightly, so make your own copy from ours:

cp /web/classes/Fall-2020/csci4271/labs/09/one-char-attack.pl .

One thing you'll probably need to change is the definition of the variable $N on line 3, which should correspond to the value of n you found on the last step. Also this version of the script doesn't explicitly do the comparison between the MACs. You can do it visually by sorting the results (piping through the sort command) and looking for duplicates.

Finish the attack

To complete the attack, you'll need to extend the idea we introduced for the first character, to see how you can also recover all the remaining characters of the hashed key. Think about: after you know the first character, how are you in a position to discover another character similar to how you found the first character when you didn't know it?

For your implementation you can either extend the one-character Perl script, or write a program in your own favorite language.

To help you confirm if you have broken the MAC successfully, we've also given you a framework script with the structure for computing the MAC in a Perl script, but where you need to fill in the parts you've reverse-engineered. As usual you can start with a copy of ours:

cp /web/classes/Fall-2020/csci4271/labs/09/mac-check.pl .

If you fill in the variables at the top of this script correctly, you should see that it always computes the same MACs as the mac binary, which means you now have the power to forge MACs. That may not sound so impressive in this situation where we've already given you the binary, but imagine a situation where the MAC is being computed by a web site to represent being logged in. By forging the MAC you could make the web site think you were logged in as someone else.