Development of Secure Software Systems

CSci 4271 Lab 14

In this week's lab you'll try out the fuzzing tool AFL, to find interesting crashing inputs of programs. (To be precise we'll be using AFL++, a more recent forked version.)

We'll walk through using AFL on two different example programs, a very contrived example based on a text adventure game, and then the slightly more realistic buggy program from Project 1.

A good starting point for the documentation of AFL++ is discussion and links you can find on the AFL++ Github front page. It's too long to suggest you read through it all during lab, though. Another interesting documentation section is the explanation of what all the console statistics mean. It's also pretty long, but you might skim though it if you're otherwise just watching the statistics screen waiting for something interesting to happen.

Because fuzzing involves creating, using, and removing files quickly, it will work noticeably faster if the files are kept on a local filesystem rather than a networked one like the CSE Labs home directories. On whatever machine you're using, we suggest creating yourself a directory under /export/scratch/users and doing your work there. The convention would be to name the subdirectory of users after your username, as in (replace "goldy007" with your username):

mkdir /export/scratch/users/goldy007
cd /export/scratch/users/goldy007
mkdir 4271-afl-lab
cd 4271-afl-lab

There are three programs from AFL++ that you'll need to use. afl-cc is a compiler that adds control flow instrumentation to make a binary suitable for use with AFL/AFL++, and afl-fuzz is the fuzzer itself. afl-tmin is a program to automatically simplify test cases. Since the location of these programs in the course directory is long, we suggest making symlinks to them in your current directory, as in the following commands.

ln -s /web/classes/Spring-2022/csci4271/soft/afl/bin/{afl-cc,afl-fuzz,afl-tmin} .

(Or you could also add the bin directory to your path.)

Finding the crash in the maze

Our first example is a modeled after a text adventure game, where getting the program to crash is like a game event. You can try compiling the program normally and running it with commands on the standard input. It's simple enough that with a little bit of experimenting and/or reading the source code you should be able to find how to get to the magic potion.

cp /web/classes/Spring-2022/csci4271/labs/14/maze.c .
gcc -Wall -g maze.c -o maze
./maze

Next let's see if AFL can find the potion (crash) as well. First recompile the program using afl-cc:

./afl-cc -g maze.c -o maze-for-afl

Though the maze that AFL needs to explore here isn't really that large, changing the input to the game randomly would take a long time to get it to the goal, because there are only a few legal commands. So the thing we can do that is most useful is to give it a dictionary of the legal commands in the game. This is like a simplified form of grammar-based fuzzing, where we just provide some useful tokens rather than a full grammar. We've supplied a sample dictionary you can use:

cp /web/classes/Spring-2022/csci4271/labs/14/maze.dict .

The other thing we have to give to get AFL started is a directory of seed inputs. A bunch of good seeds are potentially another way to give AFL information about what the input format should look like. On the other hand large seeds can cause some parts of AFL to slow down, so you can potentially put a lot of work into optimal seeds. But because we're already helping with the dictionary, choosing good inputs turns out not to be as important. Let's create a minimal set with just one:

mkdir maze-inputs
echo 'go north' >maze-inputs/input1

One other piece of trivia to deal with is that AFL suggests changing a couple of system options to make it run faster, but you won't be allowed to change those options on CSE Labs, so we need to use an environment variable to tell AFL not to worry. (The performance loss won't be too bad for our purposes.) So putting together all those resources, the command to start AFL looks like:

env AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 AFL_SKIP_CPUFREQ=1 ./afl-fuzz -i maze-inputs -x maze.dict -o maze-results -- ./maze-for-afl

Once you start AFL running, it will take over its terminal with a bunch of statistics about the execution. While it is running, it will fill the directory specified with -o, maze-results in our example command, with the interesting inputs it finds: inputs that cause new code to be executed, inputs that cause crashes, and inputs that cause execution to take much longer than expected (called hangs). These input files are kept in subdirectories of maze-results, specifically ones named default/queue, default/crashes, and default/hangs. If you're waiting to get good results as shown by the execution statistics, you can open another terminal at the same time and use it to look at the generated files. The files in the queue directory will give you an idea of how AFL is exploring the search space, while the crashes and hangs are the results that it has found so far. The file names in these directories have long names that give some information about what step of the fuzzing process produced them.

The maze example should run pretty quickly. If you're using a scratch drive on a lab machine, you should see it able to execute around 6000 tests per second (shown as exec speed) in the statistics, and it should start finding crashes within a minute or faster. (Vole machines can be noticeably slower, especially if other students are using them at the same time.) The fuzzing process is potentially endless, so you'll need to decide when you want AFL to stop, by typing Control-C is its terminal window.

Because the maze program is tolerant of a lot of junk in its input (unknown commands are just ignored), AFL's default mode will produce long crashing inputs with a mix of legal commands and random data. If you just wanted to confirm that the program had a bug or trigger it under the debugger, this would be enough. But for understanding the program it would be nice to have cleaner-looking crashing inputs. AFL has a companion tool based on the same execution experiment infrastructure that searches for ways to make test inputs smaller, which generally also makes them cleaner-looking. You can run it on one of the crashes you've found using a command like:

./afl-tmin -i maze-results/default/crashes/id:000000* -o crash-reduced.in -- ./maze-for-afl

That particular command will try to minimize the first crash, or you can replace the id:000000* with the name of a particular test case you're interested in. The output file crash-reduced.in will be a simplified crashing input.

If you're feeling lazy, you might be wondering whether it's possible for AFL to find relevant strings from the program automatically, instead of us having to create a dictionary manually. In fact AFL++ has a couple of features designed for exactly this purpose. They just require the extra step of compiling the program with a special option (an environment variable). To do this example in "cmplog" mode, change the compilation command to:

env AFL_LLVM_CMPLOG=1 ./afl-cc -g maze.c -o maze-for-afl-cmplog

And then supply this program with the -c option to afl-fuzz (eliminating the need for the dictionary):

env AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 AFL_SKIP_CPUFREQ=1 ./afl-fuzz -i maze-inputs -c $(pwd)/maze-for-afl-cmplog -o maze-results -- ./maze-for-afl

Crashing bcimgview

Another example of a C program you might be interested in crashing bugs in is bcimgview, the buggy image-parsing program from Project 1. We've also prepared a version of it you can test using a similar process as with the maze program. To make the compilation simpler we've made a version without the GUI:

cp /web/classes/Spring-2022/csci4271/labs/14/bcimgview-for-afl.c .
env AFL_LLVM_CMPLOG=1 ./afl-cc -g bcimgview-for-afl.c -o bcimgview-for-afl-cmplog -lm

For binary formats like bcimgview uses, good seed inputs can be more valuable than a dictionary. Smaller inputs are more useful for fuzzing than large ones, but it's also good to have inputs that cover as many as possible of the program's features. As a simple starting we've selected a subset of the sample images:

cp -r /web/classes/Spring-2022/csci4271/labs/14/bcimgview-inputs .

Because this program still needs the -c flag, one thing that should be slightly different in the command is how to tell AFL to run the binary: you give it a template that can contain other options, but then has the location where the input filename should go replaced with @@. So for instance the command might look like:

env AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 AFL_SKIP_CPUFREQ=1 ./afl-fuzz -i bcimgview-inputs -c $(pwd)/bcimgview-for-afl-cmplog -o bcimgview-results -- ./bcimgview-for-afl-cmplog -c @@

A couple of the vulnerabilities in bcimgview are relatively easy for AFL to trigger in our testing. AFL can find multiple crashing inputs caused be each bug, so if you are finding the inputs to be a lot to look through, you might want to try implementing a fix for one bug that AFL has already found and recompiling, so that any further crashes come from new bugs.

The existence of these crashing inputs proves that something is going wrong inside bcimgview, but because they are just binary files, they don't immediately tell you what the bug is, or whether it is exploitable. (This may be a position you also remember being in during the project.) Some steps you could take to investigate what's going wrong might include running the tests under Valgrind, running them under GDB, and comparing values you see in the crash report message or in GDB to values in the input.