Development of Secure Software Systems

CSci 4271 Lab 11

In this week's lab you'll try out the fuzzing tool AFL, to find interesting crashing inputs of programs. (To be precise we'll be using AFL++, a more recent forked version.)

As usual in the online lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done.

We'll walk through using AFL on two different example programs, a very contrived example based on a text adventure game, and the slightly more realistic example of a small interpreter for the C language.

A good starting point for the documentation of AFL++ is a long README file that you can find on the AFL++ Github front page. It's too long to suggest you read through it all during lab, though. Another interesting documentation sub-page is the explanation of what all the console statistics mean. It's also pretty long, but you might skim though it if you're otherwise just watching the statistics screen waiting for something interesting to happen.

Because fuzzing involves creating, using, and removing files quickly, it will work noticeably faster if the files are kept on a local filesystem rather than a networked one like the CSE Labs home directories. Also we discovered that the version of AFL we compiled for this lab doesn't work on Vole. So if you are using the Vole GUI environment, you should still SSH from there to a more recent CSE Labs machine like the 1-262 or 4-250 lab machines. And then on that machine, we suggest creating yourself a directory under /export/scratch/users and doing your work there. The convention would be to name the subdirectory of users after your username, as in:

mkdir /export/scratch/users/goldy007
cd /export/scratch/users/goldy007
mkdir 4271-afl-lab
cd 4271-afl-lab

There are three programs from AFL++ that you'll need to use. afl-cc is a compiler that adds control flow instrumentation to make a binary suitable for use with AFL/AFL++, and afl-fuzz is the fuzzer itself. afl-tmin is a program to automatically simplify test cases. Since the location of these programs in the course directory is long, we suggest making symlinks to them in your current directory, as in the following commands.

ln -s /web/classes/Spring-2021/csci4271/soft/afl/bin/{afl-cc,afl-fuzz,afl-tmin} .

(Or you could also add the bin directory to your path.)

Finding the crash in the maze

Our first example is a modeled after a text adventure game, where getting the program to crash is like a game event. You can try compiling the program normally and running it with commands on the standard input. It's simple enough that with a little bit of experimenting and/or reading the source code you should be able to find how to get to the magic potion.

cp /web/classes/Spring-2021/csci4271/labs/11/maze.c .
gcc -Wall -g maze.c -o maze
./maze

Next let's see if AFL can find the potion (crash) as well. First recompile the program using afl-cc:

./afl-cc -g maze.c -o maze-afl

Though the maze that AFL needs to explore here isn't really that large, changing the input to the game randomly would take a long time to get it to the goal, because there are only a few legal commands. So the thing we can do that is most useful is to give it a dictionary of the legal commands in the game. This is like a simplified form of grammar-based fuzzing, where we just provide some useful tokens rather than a full grammar. We've supplied a sample dictionary you can use:

cp /web/classes/Spring-2021/csci4271/labs/11/maze.dict .

The other thing we have to give to get AFL started is a directory of seed inputs. A bunch of good seeds are potentially another way to give AFL information about what the input format should look like. On the other hand large seeds can cause some parts of AFL to slow down, so you can potentially put a lot of work into optimal seeds. But because we're already helping with the dictionary, choosing good inputs turns out not to be as important. Let's create a minimal set with just one:

mkdir maze-inputs
echo 'go north' >maze-inputs/input1

One other piece of trivia to deal with is that AFL suggests setting up a system option to make it run faster, but you won't be allowed to change that option on CSE Labs, so we need to use an environment variable to tell AFL not to worry. So putting together all those resources, the command to start AFL looks like:

env AFL_SKIP_CPUFREQ=1 ./afl-fuzz -i maze-inputs -x maze.dict -o maze-results -- ./maze-afl

Once you start AFL running, it will take over its terminal with a bunch of statistics about the execution. While it is running, it will fill the directory specified with -o, maze-results in our example command, with the interesting inputs it finds: inputs that cause new code to be executed, inputs that cause crashes, and inputs that cause execution to take much longer than expected (called hangs). These input files are kept in subdirectories of maze-results, specifically ones named default/queue, default/crashes, and default/hangs. If you're waiting to get good results as shown by the execution statistics, you can open another terminal at the same time and use it to look at the generated files. The files in the queue directory will give you an idea of how AFL is exploring the search space, while the crashes and hangs are the results that it has found so far. The file names in these directories have long names that give some information about what part of fuzzing process produced them.

The maze example should run pretty quickly. If you're using a scratch drive on a lab machine, you should see it able to execute around 6000 tests per second (shown as exec speed) in the statistics, and it should start finding crashes within a minute or two.

Because the maze program is tolerant of a lot of junk in its input (unknown commands are just ignored), AFL's default mode will produce long crashing inputs with a mix of legal commands and random data. If you just wanted to confirm that the program had a bug or trigger it under the debugger, this would be enough. But for understanding the program it would be nice to have cleaner-looking crashing inputs. AFL has a companion tool based on the same execution experiment infrastructure that searches for ways to make test inputs smaller, which generally also makes them cleaner-looking. You can run it on one of the crashes you've found using a command like:

./afl-tmin -i maze-results/default/crashes/id:000000* -o crash-reduced.in -- ./maze-afl

That particular command will try to minimize the first crash, or you can replace the id:000000* with the name of a particular test case you're interested in. The output file crash-reduced.in will be a simplified crashing input.

Crashing c4

You can use the same basic process of fuzzing to find crashes in lots of different programs. As a slightly more realistic example, let's also take a look at "c4", a very smaller interpreter for the core of the C programming language. (The name c4 comes from being implemented using only four functions, though as you might expect all four functions are pretty complicated.) The program is open-source and you can see the original version on its GitHub page.

The original version can be compiled by GCC as well as executed by itself, but it achieves this by being loose with the type system in a way that the Clang compiler used by AFL doesn't like. So we've had to make a small change to make it compile with afl-cc.

cp /web/classes/Spring-2021/csci4271/labs/11/c4-afl.c .
./afl-cc -Wno-format -Wno-parentheses -Wno-main-return-type -g c4-afl.c -o c4-afl

You can potentially use a variety of C programs as seeds, but a simple starting place is the two C programs that come in the c4 source code, c4.c itself and a hello-world program hello.c.

git clone https://github.com/rswier/c4.git
mkdir c4-inputs
cp c4/c4.c c4/hello.c c4-inputs

Another thing that needs to be slightly different in the command is how to tell AFL to run the binary: you give it a template that can contain other options, but then has the location where the input filename should go replaced with @@. So for instance the command might look like:

env AFL_SKIP_CPUFREQ=1 ./afl-fuzz -i c4-inputs -o c4-results -- ./c4-afl @@

Given that c4 implements an unsafe language even when it's working correctly, and the code seems to not have very many bounds checks, the fuzzer should be able to find a crash without too much searching. It's also definitely also possible to make it go into an infinite loop. However you'll see that the two programs hello.c and c4.c are not a great seed set: hello.c is too small to cover much code, while c4.c is large enough that it leads to large crashes that are hard to read.