Development of Secure Software Systems

CSci 4271 Lab 11

In this week's lab you're going to do a little bit of cryptanalysis, understanding the weakness of a MAC for short strings constructed from a hash function. Specifically this will be a chosen-message attack: by computing the MAC of some specially chosen values, you'll be able to recover information that lets you forge MACs on any message you want. We'll simulate this black-box access to an instance of the MAC by giving you a program for computing the MAC that you can run.

The kind of chosen-message attack this MAC is susceptible to is an example of what's sometimes called a "lunchtime" attack against a cryptographic primitive. Given temporary black-box access to a primitive, you can figure out enough information about it to run it yourself later even after your temporary access goes away. For instance you might use your office-mate's computer while they are out to lunch. Or another real-world analogy would be the way a valet parking service has temporary access to you car key while your car is parked and then gives them back when you pick the car up: you hope they haven't been able to make a copy of the key that they can use to break into your car later. In this lab we don't have any mechanism for taking away your access to the MAC program at the end, or preventing you from copying or reverse-engineering it: you'll just have to pretend about those limitations.

The vulnerable MAC in this lab will produce 288-bit tags. Also it uses a 288-bit key, and internally it is implemented in terms of a cryptographic hash function with a 288-bit output. To make these 288-bit values easier to work with, instead of representing them as 36 bytes of binary data, we will instead represent them as 48-character strings in a format called base64, which as the name suggests represents 6 bits of binary data per ASCII character using upper and lowercase letters, digits, and the symbols slash and plus.

You may recall that the standard way to create a MAC out of a cryptographic hash function is a construction called HMAC. It's already a bit of a bad sign that our MAC is not using that standard construction. Internally the MAC truncates the output of a larger-output hash function, but that's not necessarily a bad idea. What you will see is a bad idea is that the MAC construction also truncates the input to the hash function. In context, a MAC designed for small strings might have some uses; for instance you could use the MAC of a username in a website login cookie. But the particular way this construction uses truncation makes a cryptanalytic attack possible that you'll try out.

You can get an instance of the MAC by running the script:

perl /web/classes/Spring-2022/csci4271/labs/11/make-mac.pl

on a CSE Labs Ubuntu 20.04 machine. This script will create (or overwrite if it already exists) a program named mac in you current directory, which computes MAC tags using a fixed internal key value. The string to compute the MAC over is supplied either as a single command-line input, or as the contents of the standard input. If you try it out on a few different inputs, you should see that when you give exactly the same input, you get the same MAC output, but different inputs give different outputs that all superficially look random. For instance try:

./mac a
./mac b
./mac 'a b'
./mac 'a  b'
echo -n 'a' | ./mac

Notice that whitespace is significant; the -n option to echo tells it not to add a newline character at the end.

Initial reverse engineering

Your sources have already given you some general information about how the MAC is implemented. We'll denote its internal hashing operation as a function H. H is implemented by running SHA-512 on the same input, and then selecting the first 288 bits of the SHA-512 output and encoding them as 48 characters in base64. You also know that the hash function is used a total of 3 times. First the hash function is applied to the key; then this hash is combined with the message and truncated to a fixed length if it is too long; then this truncated string is hashed, and the hash output is hashed again. The combination of the message and the hashed key is done by concatenation, and the truncation starts from the side of the concatenation with the message, so bytes from the key are dropped before bytes from the message. To explain it in forumulas, suppose that M represents the string that we're computing the MAC over and K is the key. Then the structure of the MAC is either:

H(H(prefix(M || H(K))))

H(H(suffix(H(K) || M)))

Where the prefix function keeps just the first n bytes of a string if it's too long, while suffix keeps just the last n bytes. But to start out we don't know the value of n.

For your first step in reverse engineering the MAC, figure out the maximum length n of bytes from the message and hashed key that are retained in the MAC computation. You can do this by just computing MACs of larger and larger inputs; at some point, making the input larger will stop changing the MAC.

Recovering one key-hash byte

The next part of the attack is seeing how you can figure out one byte out of the key hash. The basic idea is to create a string of length n - 1; suppose its MAC is t. Then consider all of the n-character strings obtained by adding one base64 character at either the beginning or the end, and their MACs. If you've figured out the value if n correctly, one of the hashes of n-character strings should match with t. What does that match tell you about the structure of the MAC and the contents of H(K)?

To get you started, we've implemented some of the infrastructure for this attack as either a Perl script or a C program. But you'll need to modify it slightly, so make your own copy from ours, using one of the following commands:

cp /web/classes/Spring-2022/csci4271/labs/11/one-char-attack.pl .
cp /web/classes/Spring-2022/csci4271/labs/11/one-char-attack.c .

One thing you'll probably need to change is the definition of the variable N ($N in Perl) near the top of the program, which should correspond to the value of n you found on the last step. If you've figured out the right value of n and run this program, it should find exactly one character which leads to getting the same MAC when it is added either at the start or the end of the input.

Finish the attack

To complete the attack, you'll need to extend the idea we introduced for the first character, to see how you can also recover all the remaining characters of the hashed key. Think about: after you know the first character, how are you in a position to discover another character similar to how you found the first character when you didn't know it?

For your implementation you can either extend one of the one-character attack programs we provided, or write a program in your own favorite language.

To help you confirm if you have broken the MAC successfully, we've also given you a framework script with the structure for computing the MAC in a Perl script, but where you need to fill in the parts you've reverse-engineered. As usual you can start with a copy of ours:

cp /web/classes/Spring-2022/csci4271/labs/11/mac-check.pl .

If you fill in the variables at the top of this script correctly, you should see that it always computes the same MACs as the mac binary, which means you now have the power to forge MACs. That may not sound so impressive in this situation where we've already given you the binary, but imagine a situation where the MAC is being computed by a web site to represent being logged in. By forging the MAC you could make the web site think you were logged in as someone else.