Development of Secure Software Systems

CSci 4271 Lab 4

In this week's lab you'll try out the fuzzing tool AFL, to find interesting crashing inputs of programs, and try out some attacker techniques related to heap vulnerabilities. (To be precise we'll be using AFL++, a more recent forked version.)

We'll walk through using AFL on several example programs, starting with a contrived example based on a text adventure game, and then moving on to some increasingly realistic programs. Mixed with these will be two sample programs for you to try heap-related manipulations.

A good starting point for the documentation of AFL++ is discussion and links you can find on the AFL++ Github front page. It's too long to suggest you read through it all during lab, though. (A subset of this information is also available in the directory /usr/share/doc/afl++-doc). Another interesting documentation section is the explanation of what all the console statistics mean. It's also pretty long, but you might skim though it if you're otherwise just watching the statistics screen waiting for something interesting to happen.

CSE-IT has installed a suitable version of AFL++ on the workstations that are running Ubuntu 22.04. Also, because fuzzing can be CPU-intensive, it can be better to do on a non-shared machine. Therefore we recommend using the 1-262 lab workstations, either in person or remotely. If you want to try using Vole, the classic Vole cluster that is running Ubuntu 20.04 is too old: you'd need to try out the VOLEFX3 cluster that is now in testing. (In previous semesters we had also compiled our own version of AFL++ to run on Ubuntu 20.04, but while you still may be able to find it, these instructions don't recommend it.)

Because fuzzing involves creating, using, and removing files quickly, it will work noticeably faster if the files are kept on a local filesystem rather than a networked one like the CSE Labs home directories. On whatever machine you're using, we suggest creating yourself a directory under /export/scratch/users and doing your work there. The convention would be to name the subdirectory of users after your username, as in (replace "goldy007" with your username):

mkdir /export/scratch/users/goldy007
cd /export/scratch/users/goldy007
mkdir 4271-afl-lab
cd 4271-afl-lab

There are three programs from AFL++ that you'll need to use. afl-cc is a compiler that adds control flow instrumentation to make a binary suitable for use with AFL/AFL++, and afl-fuzz is the fuzzer itself. afl-tmin is a program to automatically simplify test cases. If you've followed our recommendation of using a CSE Labs Ubuntu 22.04 workstation, suitable versions of these programs should already be in your path. You can check the version of AFL available with this command (shown with the expected output) :

$ afl-fuzz -h 2>&1 | head -1
afl-fuzz++4.00c based on afl by Michal Zalewski and a large online community

Finding the crash in the maze

Our first example is a modeled after a text adventure game, where getting the program to crash is like a game event. You can try compiling the program normally and running it with commands on the standard input. It's simple enough that with a little bit of experimenting and/or reading the source code you should be able to find how to get to the magic potion.

cp /web/classes/Spring-2024/csci4271/labs/04/maze.c .
gcc -Wall -g maze.c -o maze
./maze

Next let's see if AFL can find the potion (crash) as well. First recompile the program using afl-cc:

afl-cc -g maze.c -o maze-for-afl

Though the maze that AFL needs to explore here isn't really that large, changing the input to the game randomly would take a long time to get it to the goal, because there are only a few legal commands. So the thing we can do that is most useful is to give it a dictionary of the legal commands in the game. This is like a simplified form of grammar-based fuzzing, where we just provide some useful tokens rather than a full grammar. We've supplied a sample dictionary you can use:

cp /web/classes/Spring-2024/csci4271/labs/04/maze.dict .

The other thing we have to give to get AFL started is a directory of seed inputs. A bunch of good seeds are potentially another way to give AFL information about what the input format should look like. On the other hand large seeds can cause some parts of AFL to slow down, so you can potentially put a lot of work into optimal seeds. But because we're already helping with the dictionary, choosing good inputs turns out not to be as important. Let's create a minimal set with just one:

mkdir maze-inputs
echo 'go north' >maze-inputs/input1

One other piece of trivia to deal with is that AFL suggests changing a couple of system options to make it run faster, but you won't be allowed to change those options on CSE Labs, so we need to use an environment variable AFL_SKIP_CPUFREQ=1 to tell AFL not to worry. (The performance loss won't be too bad for our purposes.) On some other systems (including out-of-the-box Ubuntu systems), you may also need to set the environment variable AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=1 if you can't change the way crash reporting works to be compatible with AFL. This is a slightly more serious problem that can cause some crashes to be misclassified as hangs, so if you're running this on your own computer you might consider taking AFL's advice of changing the setting. But in our recent testing the CSE Labs machines don't need this. So putting together all those resources, the command to start AFL looks like:

env AFL_SKIP_CPUFREQ=1 afl-fuzz -i maze-inputs -x maze.dict -o maze-results -- ./maze-for-afl

Once you start AFL running, it will take over its terminal with a bunch of statistics about the execution. While it is running, it will fill the directory specified with -o, maze-results in our example command, with the interesting inputs it finds: inputs that cause new code to be executed, inputs that cause crashes, and inputs that cause execution to take much longer than expected (called hangs). These input files are kept in subdirectories of maze-results, specifically ones named default/queue, default/crashes, and default/hangs. If you're waiting to get good results as shown by the execution statistics, you can open another terminal at the same time and use it to look at the generated files. The files in the queue directory will give you an idea of how AFL is exploring the search space, while the crashes and hangs are the results that it has found so far. The file names in these directories have long names that give some information about what step of the fuzzing process produced them.

The maze example should run pretty quickly. If you're using a scratch drive on a lab machine, you should see it able to execute around 5000 or more tests per second (shown as exec speed) in the statistics, and it should start finding crashes within a minute or faster. (Vole machines can be noticeably slower, especially if other students are using them at the same time.) The fuzzing process is potentially endless, so you'll need to decide when you want AFL to stop, by typing Control-C in its terminal window. In the real world, you would often run a fuzzer for hours or days to find the most interesting results. But because your time in lab and the CSE Labs' computing power are limited, we've tried to pick the examples in this lab so that you can get results fairly quickly. If you aren't seeing the results you're expecting after a few minutes, you should try to debug what's happening.

Because the maze program is tolerant of a lot of junk in its input (unknown commands are just ignored), AFL's default mode will produce long crashing inputs with a mix of legal commands and random data. If you just wanted to confirm that the program had a bug or trigger it under the debugger, this would be enough. But for understanding the program it would be nice to have cleaner-looking crashing inputs. AFL has a companion tool based on the same execution experiment infrastructure that searches for ways to make test inputs smaller, which generally also makes them cleaner-looking. You can run it on one of the crashes you've found using a command like:

afl-tmin -i maze-results/default/crashes/id:000000,* -o maze-crash-reduced.in -- ./maze-for-afl

That particular command will try to minimize the first crash, or you can replace the id:000000,* with the name of a particular test case you're interested in. The output file maze-crash-reduced.in will be a simplified crashing input.

If you're feeling lazy, you might be wondering whether it's possible for AFL to find relevant strings from the program automatically, instead of us having to create a dictionary manually. In fact AFL++ has a couple of features designed for exactly this purpose. They just require the extra step of compiling the program with a special option (an environment variable). To do this example in "cmplog" mode, change the compilation command to:

env AFL_LLVM_CMPLOG=1 afl-cc -g maze.c -o maze-for-afl-cmplog

And then supply this program with the -c option to afl-fuzz (eliminating the need for the dictionary):

env AFL_SKIP_CPUFREQ=1 afl-fuzz -i maze-inputs -c $(pwd)/maze-for-afl-cmplog -o maze-results-2 -- ./maze-for-afl

Overflowing into an adjacent heap object

A number of different factors can affect where a malloc implementation puts heap-allocated objects. But it is common across many malloc implementations that if you allocate several objects of the same size, some of them are likely to be close to each other in memory. If objects are allocated close to one another, then a forward sequential overflow of a buffer in one object can overwrite data in a later nearby object, along with overwriting heap metadata along the way. To explore this phenomenon, we've given you a program named heap-nearby to play with, which we suggest you copy in source and binary form in the usual way:

cp /web/classes/Spring-2024/csci4271/labs/04/heap-nearby{,.c} .

The normal non-attack usage of this program is to supply it four short (up to 7 letter) words as arguments, and it will print them in order. Try this out and take a look at the source code to satisfy yourself you understand what's going on.

The obj structs are 16 (0x10) bytes each. Since there's not much complicated going on in this program, you should see that the relative positions of the allocations are the same on every run, though the starting addresses will differ if you haven't disabled ASLR. Here's an example of what you might see:

Object 0 is at 0x4052a0
Object 1 is at 0x4056d0
Object 2 is at 0x4056f0
Object 3 is at 0x405710

In this output, the distance from object 1 to object 2, and from 2 to 3, are both 32 (0x20) bytes. That consists of 16 bytes for the objects themselves, and the other 16 bytes for two words of heap metadata.

If you corrupt the heap metadata, the program is likely to notice the problem and exit when it tries to free the objects. You should be able to observe this happening if you overflow object 1 by a small amount. At least two different error messages should be easy to get by overwriting with printable characters. (More subtle corruption may also be possible, though you are limited by the overwrite stopping at the first 0 byte.)

What if you want to take over the program? The metadata corruption won't be an obstacle to an attack if you can hijack the program's execution before the objects are freed. Given what you did last week, the function pointer field in the structure looks like an appealing target. Do you notice any other functions in the program that take a single string as an argument, and that would be useful for an attacker to execute? Here's a command that shows you the address you should use:

objdump -d heap-nearby | grep 'call.*system'

As a variation on what we did in the last lab, another shell command you can use to create strings with arbitrary (non-null) characters in them is printf. It supports the same \x syntax that C strings do for specifying a character via its hex code.

echo $(printf '\x48\x65\x6c\x6c\x6f')

Crashing c4

You can use the same basic process of fuzzing to find crashes in lots of different programs. As a slightly more realistic example, let's next take a look at "c4", a very small interpreter for the core of the C programming language. (The name c4 comes from being implemented using only four functions, though as you might expect all four functions are pretty complicated.) The program is open-source and you can see the original version on its GitHub page.

The original version can be compiled by GCC as well as executed by itself, but it achieves this by being loose with the type system in a way that the Clang compiler used by AFL doesn't like. So we've had to make a small change to make it compile with afl-cc.

cp /web/classes/Spring-2024/csci4271/labs/04/c4-afl.c .
afl-cc -Wno-format -Wno-parentheses -Wno-main-return-type -g c4-afl.c -o c4-afl

You can potentially use a variety of C programs as seeds, but a simple starting place is the two C programs that come in the c4 source code, c4.c itself and a hello-world program hello.c.

git clone https://github.com/rswier/c4.git
mkdir c4-inputs
cp c4/c4.c c4/hello.c c4-inputs

Another thing that needs to be slightly different in the command is how to tell AFL to run the binary: you give it a template that can contain other options, but then has the location where the input filename should go replaced with @@. So for instance the command might look like:

env AFL_SKIP_CPUFREQ=1 afl-fuzz -i c4-inputs -o c4-results -- ./c4-afl @@

Note that you will also need to use the @@ syntax if you run afl-tmin.

Given that c4 implements an unsafe language even when it's working correctly, and the code seems to not have very many bounds checks, the fuzzer should be able to find a crash without too much searching. It's also definitely also possible to make it go into an infinite loop. However you'll see that the two programs hello.c and c4.c are not a great seed set: hello.c is too small to cover much code, while c4.c is large enough that it leads to large crashes that are hard to read. Try creating some other short seed files that are legal according to C4's rules so that AFL can create a wider variety of short crashing or hanging inputs.

Controlling object locations via sizes

Our second example program related to the heap illustrates some of the ways an attacker can indirectly affect to location chosen for objects, by controlling their size and order of allocation. Our sample program will do a number of allocations based on instructions supplied on the standard input, which affect the locations chosen for future dynamically-allocated object. Your goal is to set up a couple of unusual situations in the object locations. You can copy the compiled program and its source code in the usual way:

cp /web/classes/Spring-2024/csci4271/labs/04/size-and-loc{,.c} .
cp /web/classes/Spring-2024/csci4271/labs/04/sl-template.in sl.in

In order, this program allocates one A object, any number of B objects, and then any number of C objects. The size of the A and C objects are controlled by the input, but the B objects are of fixed size (40 bytes). The A object is freed in between the allocation of the B and C objects, but the B and C objects are not freed. In addition to choosing the number of B and C objects, and the size of the A and C objects, when supplying the input you also specially designate one of the B objects and two of the C objects, which we'll call B*, C1, and C2. For instance here's the template input file we've given you:

A 50
B
B*
B
FA
C 10
C 20
C1 30
C2 40
C 50
C 60
END

This file corresponds to events in chronological order: lines starting with A, B or C are allocations, some with sizes or */1/2 suffixes; FA represents the point where the A object is freed, and END marks the end.

The binary we've given you is non-PIE, but we haven't asked you to disable ASLR, so the heap will start at a random, but relatively low, address. Typically, if you allocate objects in sequence without freeing them, their addresses will be increasing over time. In contrast to that, we're asking you to set up two unusual features of the locations of the identified objects. Your first goal is to have the B* object have a higher address than the C1 object, even though the B* object is allocated first. Your second goal is to have the C2 object be closer to the stack than heap objects normally would be. The stack and heap are separate, so "close" is a relative term: we're asking you to get the distance between the starting point of the C2 object and a value near the top of the stack be less than 2 trillion bytes in address-space terms. You can achieve both of these objectives, at the same time, by modifying the input file. The next paragraph will provide some hints, though you can stop here if you want to explore fully on your own.

The most powerful part of your control in this example is the sizes of the A and C objects. Roughly speaking, the memory allocator inside glibc has three kinds of behavior based on the size of the objects being allocated. All of the sizes mentioned in the template fall in the smallest category, but you can change the allocator's behavior by making the sizes larger. If you would like more insight into its behavior, beyond trial and error, you can read the documentation for the mallopt(3) library function (e.g., with the command man mallopt). This function can be used to change configuration parameters of the memory allocator. We aren't suggesting that you change these settings: rather, the description of the settings and their default values explains how the memory allocator works in the example. Specifically you might be interested in M_MMAP_THRESHOLD and M_MXFAST.

Crashing a decompression program

If the previous programs seemed too small or unrealistic, here is one that is more complicated. We've created a simplified decompression utility for the lzip file format; but it still handles the full file format, so it's about 1600 lines of code. Except that whereas the real implementation this is derived from has already been tested and fuzzed and is probably pretty robust, this version contains a memory safety bug that might be triggered in unusual circumstances. The bug isn't very stealthily hidden, so you might also be able to find it just by auditing the code, but 1600 lines of code would take a while just to read. So you could think of this as a race between whether you or AFL can find the problem first. You can copy the program source code with this command:

cp /web/classes/Spring-2024/csci4271/labs/04/mini-lunzip-bug1.c .

You can compile the program in the normal way. It is similar in behavior to the command lzip -dc in that it reads a file compressed in lzip format and prints the decompressed result to the standard output; a single file name should be the only command-line input and no options are supported. So you can run it with @@ similar to c4. We haven't given you any seed inputs, but you can make some with lzip.