Development of Secure Software Systems

CSci 4271 Lab 13

In this week's lab we're returning to the world of hashed password storage and offline dictionary attacks from two weeks ago, but trying out a pre-written attack tool and a hash function with a higher work factor.

As usual in the online lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done.

Trying to guess (or "crack") passwords can be fun, but it can also use a lot of CPU time. If you're interesting in continuing the activities here outside of lab, it would be better if you do so with your own computer, to not use up too many CSE Labs resources. Also tools can run faster if you can give them access to a GPU which we can't on the lab machines.

Digest authentication and John the Ripper

John the Ripper, or just john on the command line, is an open-source tool for carrying out various dictionary-style attacks against hashed passwords. Make a link from the version we compiled to your working directory:

ln -s /web/classes/Fall-2020/csci4271/soft/john/john john

For the first part of the lab, you'll repeat some dictionary attacks like you did two weeks ago, but using john instead of custom-written code. The full-featured version of John we've compiled for you also knows about HTTP Digest Authentication hashing, but it uses a slightly different text input format that the CSV format we used last time. This week's set of target hashed passwords are a little bit easier that the old set, and we've also already given them to you in a John-the-Ripper-compatible format. You can get a copy like this:

cp /web/classes/Fall-2020/csci4271/labs/13/hdaa-digests100.pw .

You can see that this file has roughly the same information as the CSV file, but it mostly uses dollar signs instead of commas as separators. John has a default mode where it first tries a short wordlist, and then tries generating character sequences in increasing length. Try running the following command for around a minute to see what it finds:

./john --format=hdaa hdaa-digests100.pw

The --format option tells the format of the hashes to expect. You should see John's default configuration find several passwords: they're each printed on a line followed by the corresponding username in parentheses.

One feature to keep in mind is that by default john will automatically remember all of the passwords it has already guessed, so as not to waste time on them. As a message says, you can get it to print its cache by rerunning with the --show option. If you want to clear its cache and make it start over, remove the file ~/.john/john.pot, where ~ indicates your home directory.

Ramping up the attack

John's default mode wouldn't be able to guess all the passwords in this week's challenge set quickly, because they come from different patterns and some are fairly long. But putting your pattern recognition abilities together with its implementation you should be able to make a lot of progress. Keep rerunning the tool with different options to have it try different sets of candidate passwords. Below are several suggestions, not exhaustive. You can read detailed documentation here, particularly the files EXAMPLES, OPTIONS, and MASK.

If you're running on a personal computer or a lab machine that you're not sharing with anyone else, you can make john go faster by telling it to use more cores in parallel. You do this with an option like --fork=6, which tells it to break the task into 6 pieces to run at once.

A basic feature to control is the list of words that john tries, using the --wordlist= option; you put a filename right after the equals sign. The file /usr/share/dict/words that we used before might be relevant, for instance. You can also pre-process the dictionary with other text-processing tools.

The relatively brute-force mode where john generates passwords in increasing length is called "incremental" mode. The --incremental= option takes an argument to control what kinds of characters to use; for instance it will go faster if you use only digits.

The "mask" mode generates passwords according to a pattern with different characters in different positions. Most characters in the mask are interpreted literally, but for instance you can use ?d to represent any digit, ?l for a lowercase letter, and so on. You can also combine a wordlist with mask mode; then ?w represents a word from the wordlist.

Switching to a more expensive hash function

The attacks in the previous section worked pretty well because MD5 hashes can be computed very quickly. If you're willing to trade-off some time in normal execution, you can make life harder for attackers by slowing down the dictionary attack an equal amount, or designing it so that it uses other expensive resources. For instance here we'll experiment with Argon2, a password hashing function that can be designed to require both CPU time and memory.

Our version of John the Ripper knows hows about Argon2 as well, and you can see that if you give it a very easy cracking task, it can still do it. For instance start with these hashed passwords and a short wordlist:

cp /web/classes/Fall-2020/csci4271/labs/13/argon10-easy.pw .
cp /web/classes/Fall-2020/csci4271/labs/13/easy.dict .

If you specify the right wordlist, you can see that these can be dealt with quickly; they also use low-difficulty settings for Argon2. By contrast, look at:

cp /web/classes/Fall-2020/csci4271/labs/13/argon100-harder.pw .

These words are generated from the same probability distributions as the ones in the first part of the lab, but they use Argon2 with a setting where it takes half a second or more to compute a single hash. You'll see they're much harder to handle.

Writing your own code with Argon2

Argon2 is implemented in a library that you can also use from your own code; for instance we've compiled the C library for your use. Linking it to your own directory will save some typing:

ln -s /web/classes/Fall-2020/csci4271/labs/13/argon2.h .
ln -s /web/classes/Fall-2020/csci4271/labs/13/libargon2.a .

As sample code that uses the library, here's a program that computes hashes based on a CSV-format file:

cp /web/classes/Fall-2020/csci4271/labs/13/argon-generator.c .
gcc -Wall -g argon-generator.c -L. -largon2 -lpthread  -o argon-generator

You can try modifying this program to create one that carries out a dictionary attack.