Development of Secure Software Systems

CSci 4271 Lab 13

In this week's lab you're going to try out some forms of an offline dictionary attack, which lets an attacker try out different plaintext values of a password and see which one matches a hash. A basic version of this kind of attack would be easy to implement on your own, but to help you get results faster, we'll be using a configurable open source tool named John the Ripper. To speed things up, we'll use hashed passwords with a hash that is fast to compute, but the best approach isn't going be to to just let the automated tool run: noticing patterns in how some of the passwords were created can help you find the other ones more efficiently.

Using John the Ripper

There are a number of pre-written tools for dictionary attacks against passwords. John the Ripper, or just john on the command line, is an open-source tool for carrying out various dictionary-style attacks against hashed passwords. Make a link from the version we've compiled to your working directory:

ln -s /web/classes/Spring-2022/csci4271/soft/john/john john

Because John has to be configured to take advantage of particular CPU features at compile time, the binary we've compiled may not work on every CSE Labs machine (as you can see near the beginning of its --help output, it is expecting a CPU with SSE4.1 support). But we've tested that the 1-262 lab machines and a regular Vole machine should both work.

(If you want to work from your own computer, you can also install your own copy of John the Ripper, but as a warning the version in Debian and Ubuntu Linux (and maybe elsewhere) is out of date and doesn't support all the features we recommend below.)

John's default input format is modeled after a traditional Unix password file, and to speed things up we've also used the traditional Unix password hashing function, based on DES. By modern standards this hash can be computed much more quickly than would be desirable, but this lets us use less computing power for the lab. Copy the database of 100 users and their hashed passwords in this format:

cp /web/classes/Spring-2022/csci4271/labs/13/descrypt.pw .

The first field of each line, up to a colon, is a Unix-style username. After the colon is a hashed password in a base64-like encoding. The first two characters of the hashed password are the salt. Both the size of the salt and the total hash size are small by modern standards; in fact there is a collision of salt values even in this set.

John has a default mode where it first tries a short wordlist, and then tries generating character sequences in increasing length, which is good for getting started. Try using the following command to run it for a minute and see what it finds:

./john --max-run-time=60 descrypt.pw

You should see John's default configuration find several passwords, 10-20 depending on your CPU speed: they're each printed on a line followed by the corresponding username in parentheses.

One feature to keep in mind is that by default john will automatically remember all of the passwords it has already guessed, so as not to waste time on them. As a message says, you can get it to print its cache by rerunning with the --show option. If you want to clear its cache and make it start over, remove the file ~/.john/john.pot, where ~ indicates your home directory.

Ramping up the attack

John's default mode wouldn't be able to guess all the passwords in the challenge set quickly, because they come from different patterns and some are fairly long. But putting your pattern recognition abilities together with its implementation you should be able to make a lot of progress. Keep rerunning the tool with different options to have it try different sets of candidate passwords. Below are several suggestions, not exhaustive. You can read detailed documentation here, particularly the files EXAMPLES, OPTIONS, and MASK.

If you're running on a personal computer or a lab machine that you're not sharing with anyone else, you can make john go faster by telling it to use more cores in parallel. You do this with an option like --fork=6, which tells it to break the task into 6 pieces to run at once. But it would be inconsiderate to do this on a shared (e.g., Vole) machine.

A basic feature to control is the list of words that john tries, using the --wordlist= option; you put a filename right after the equals sign. The file /usr/share/dict/words is a list of English words similar to the keywords in regular dictionary. It would be a good place to start for these hashed passwords. You can also pre-process the dictionary with other text-processing tools, or supply a wordlist on the standard input with the option --pipe.

The relatively brute-force mode where john generates passwords in increasing length is called "incremental" mode; it's a little smarter than pure brute force because it knows something about what characters likely occur together. The --incremental= option takes an argument to control what kinds of characters to use; for instance it will go faster if you use only digits.

The "mask" mode generates passwords according to a pattern with different characters in different positions. Most characters in the mask are interpreted literally, but for instance you can use ?d to represent any digit, ?l for a lowercase letter, and so on. You can also combine a wordlist with mask mode; then ?w represents a word from the wordlist.

Our 100 passwords consist of 10 groups of 10 passwords each selected in the same way. If you know the right combinations of patterns, they can all be cracked within about 5 minutes of CPU time (even single-threaded). But it's not so easy if you don't know the patterns in advance. You'll probably have to alternate between running in more general-purpose modes for a while, then doing more targeted searches based on patterns you notice.