University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 11

In this week's lab you're going to try out some forms of an offline dictionary attack, which lets an attacker try out different plaintext values of a password and see which one matches a hash. Specifically you'll be trying out of form of that attack that works against an older form of authentication used by some web servers. But there won't actually be a web server involved in your work: we'll just give you data like would have been collected from watching accesses; this data is enough to guess passwords without further interaction with the server.

As usual in the online lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done. We also recommend working in groups in the in-person lab, but there you can choose your own groups and physically raise your hand to ask a question. You may still find it useful to use Zoom or tmate for screen sharing in person while respecting social distancing.

The web server authentication mechanism we're attacking today is called HTTP Digest Authentication. Digest here is a synonym for a cryptographic hash function. The reason this authentication mechanism was developed was to replace a simpler kind of authentication in which a username and password are sent directly in a request, which is obviously very vulnerable to network-level sniffing attacks. Instead Digest authentication is a challenge-response protocol using hashing. The client and the server both have a copy of the password and use it to do a cryptographic computation using the hash as a one-way function. If the client uses the correct password, it will be able to compute the correct response (that the server can check by also generating it), and the request can be allowed. As long as the hash function is one-way, someone who sees the network traffic can't reverse it to get the password. However the as you'll see, the network traffic is still enough to allow a guess-and-check kind of attack to try to recover the password. This is called a dictionary attack, because the attacker has a long list of possible passwords that's like a dictionary.

The official definition of the version of HTTP Digest Authentication we'll be attacking is in a 1999 document called RFC 2617. It also has a good, and more compact, description in a Wikipedia page.

Completing the digest implementation

To carry out the attack, you'll need a program that can repeatedly compute different versions of the authentication response using different passwords. We've given you the starting framework for such a program as a Perl script, which you can make a working copy of like this:

cp /web/classes/Fall-2020/csci4271/labs/11/dict-attack-template.pl dict-attack.pl

If you prefer Python or C over Perl, there's also a dict-attack-template.py and a dict-attack-template.c that you can use for this part of the lab, but which will be of less help for the next part.

The subroutine digest is supposed to do the computation of the authentication response, but you can see that we've left the definition unfinished. The nine parameters to the subroutine represent the different pieces of information that go together into the response computation. They mostly have the same names they are given in other documentation. The one we call "snonce" is also sometimes just called "nonce", because in earlier versions of the protocol there was only one nonce. But in the current version, nonce/snonce is the nonce that comes from the server, while cnonce comes from the client. The basic operations used in construction the response are concatenating strings together, which you can do with variables inside double-quoted strings, and computing MD5 checksums in lowercase hexadecimal, done with the function md5_hex.

It's important to check that the digest function is implemented exactly correctly, because the hashing means that any small problem will make the output completely different, and the problem can be hard to track down. To help with this, our framework code includes two functions that are called test vectors: they are samples of inputs to the function for which the output is known. The script will automatically run these checks first thing, and then stop at the first one that fails, so make sure these unit tests pass before going on to the next step. To just run these tests and skip the other unfinished functionality of the script, you can run it like:

perl dict-attack.pl /dev/null

Building the dictionary attack

Next let's complete a simple and literal form of a dictionary attack. You'll see we've already written code for you which reads all the words in an English wordlist into an array named @dict. The data that you can imagine you collected earlier from snooping on web-site traffic is in a comma-separated database file which you can copy:

cp /web/classes/Fall-2020/csci4271/labs/11/digests.csv .

The first line of the CSV file is a header that gives the column names. Then each subsequent line has the data from one user accessing the password-protected web site. Each of these users probably has a different password, so your script will need to try each combination of user and possible password. Inside the loop we've already written that reads each line from the CSV file, you'll need to try the different possible passwords from the dictionary, running digest each time. If the result of running digest with a test password matches the response recorded in the CSV file, you've guessed the right password, and should print a message about it.

You may find it useful to know that Perl has a loop syntax that looks like for $word (@dict) { ... } to iterate over all of the values in an array. If you believe you have your script working, you can try running it using the CSV file like this:

perl dict-attack.pl digests.csv

If you use the suggested dictionary, you script shouldn't take more that a couple of seconds to run, and it should be able to guess 10 or more passwords.

Making the attack more sophisticated

You may have noticed that there are 100 users in the CSV file, and only a few of them have their passwords guessed by the literal dictionary version of the attack. The reason of course is that people use lots of other strategies for picking passwords besides just literal dictionary words. For instance they might choose sequences of other characters in other ways, or start with dictionary words and then modify them somehow. Try seeing which other passwords you can guess ("crack") by modifying your script, using different lists of passwords, and so on. A few of the passwords might be pretty hard, but a majority of the ones in the list aren't.