Main navigation | Main content
In this week's lab you're going to try out some forms of an offline dictionary attack, which lets an attacker try out different plaintext values of a password and see which one matches a hash. A basic version of this kind of attack would be easy to implement on your own, but to help you get results faster, we'll be using a configurable open source tool named John the Ripper. To speed things up, we'll use hashed passwords with a hash that is fast to compute, but the best approach isn't going be to to just let the automated tool run: noticing patterns in how some of the passwords were created can help you find the other ones more efficiently.
There are a number of pre-written tools for dictionary attacks against passwords. John the Ripper, or just john on the command line, is an open-source tool for carrying out various dictionary-style attacks against hashed passwords. Make a link from the version we've compiled to your working directory:
ln -s /web/classes/Fall-2023/csci4271/soft/john/bin/john-22.04 john
Because John has to be configured to take advantage of particular CPU features at compile time, as well as OS version incompatibilities, there are several different versions of the program that will work on different subsets of CSE Labs machines. (The version information the beginning of the program's -h output includes information about these options.) The version john-22.04 mentioned above is for the 1-250 Keller lab workstations, and other systems running Ubuntu 22.04 whose CPUs support a newer feature set named AVX512BW. The VOLE-FX3 machines also run 22.04, but have fewer CPU features, so for them there is a binary named john-vole-fx3. (Because the CPU is backwards compatible, this binary will also run on lab machines, but slower.) Finally the old Vole cluster is running Ubuntu 20.04, and should use john-20.04.
(If you want to work from your own computer, you can also install your own copy of John the Ripper, but as a warning the version in Debian and Ubuntu Linux (and maybe elsewhere) is out of date and doesn't support all the features we recommend below.)
John's default input format is modeled after a traditional Unix password file, and to speed things up we've also used the traditional Unix password hashing function, based on DES. By modern standards this hash can be computed much more quickly than would be desirable, but this lets us use less computing power for the lab. Copy the database of 100 users and their hashed passwords in this format:
cp /web/classes/Fall-2023/csci4271/labs/14/descrypt.pw .
The first field of each line, up to a colon, is a Unix-style username. After the colon is a hashed password in a base64-like encoding. The first two characters of the hashed password are the salt. Both the size of the salt and the total hash size are small by modern standards; in fact there is a collision of salt values even in this set.
John has a default mode where it first tries a short wordlist, and then tries generating character sequences in increasing length, which is good for getting started. Try using the following command to run it for a minute and see what it finds:
./john --max-run-time=60 descrypt.pw
You should see John's default configuration find several passwords, 20-45 depending on your CPU speed: they're each printed on a line followed by the corresponding username in parentheses.
One feature to keep in mind is that by default john will automatically remember all of the passwords it has already guessed, so as not to waste time on them. As a message says, you can get it to print its cache by rerunning with the --show option. If you want to clear its cache and make it start over, remove the file ~/.john/john.pot, where ~ represents your home directory.
John's default mode wouldn't be able to guess all the passwords in the challenge set quickly, because they come from different patterns and some are fairly long. But putting your pattern recognition abilities together with its implementation you should be able to make a lot of progress. Keep rerunning the tool with different options to have it try different sets of candidate passwords. Below are several suggestions, not exhaustive. You can read detailed documentation here, particularly the files EXAMPLES, OPTIONS, and MASK.
If you're running on a personal computer or a lab machine that you're not sharing with anyone else, you can make john go faster by telling it to use more cores in parallel. The easiest way to do this is to supply an option like --fork=6, which will run 6 search processes at once. But you won't see improvement beyond the number of CPU cores available. So for instance on a lab machine you wouldn't want to try to run more than about 8 threads/copies at once. To be considerate of other machine users, we ask that you don't use parallel processing at all on a shared (e.g., Vole) machine. (If you have access to a machine with a GPU, John can be compiled to use that too, though not the version we've compiled.)
A basic feature to control is the list of words that john tries, using the --wordlist= option; you put a filename right after the equals sign. The file /usr/share/dict/words is a list of English words similar to the keywords in regular dictionary. It would be a good place to start for these hashed passwords. You can also pre-process the dictionary with other text-processing tools, or supply a wordlist on the standard input with the option --pipe.
The relatively brute-force mode where john generates passwords in increasing length is called "incremental" mode; it's a little smarter than pure brute force because it knows something about what characters likely occur together. The --incremental= option takes an argument to control what kinds of characters to use; for instance it will go faster if you use only digits.
The "mask" mode generates passwords according to a pattern with different characters in different positions. Most characters in the mask are interpreted literally, but for instance you can use ?d to represent any digit, ?l for a lowercase letter, and so on. You can also combine a wordlist with mask mode; then ?w represents a word from the wordlist.
Our 100 passwords consist of 10 groups of 10 passwords each selected in the same way. If you know the right combinations of patterns and check them in the best order, they can all be cracked within about 6 minutes of CPU time (even single-threaded). But it's not so easy if you don't know the patterns in advance. You'll probably have to alternate between running in more general-purpose modes for a while, then doing more targeted searches based on patterns you notice.