University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 11

Today's lab will go in depth on another kind of vulnerability that had come up in lecture and the attack techniques for it, a format string injection. Similarly as we've done to simplify other control-flow hijacking examples, you won't need shellcode: instead your goal is to transfer control to the attack_function function. We also recommend that you spend some time (maybe 15-20 minutes) at the beginning of the lab practicing auditing the code to find the information you need to attack the vulnerability. But we'll give out suggestions along these lines on a separate linked page. You could also spend longer on looking for the vulnerability if you feel that's what you need the most practice with, but we'd recommend coming back to the attack techniques later.

Background and setup

The basic way that a format string vulnerability that allows an attacker to use the %n format specifier works is to provide what we call a write-what-where attack primitive. In other words, the vulnerability gives the attacker the capability to write a value of the attacker's choosing, to a location of the attacker's choosing, sometimes with some other restrictions. You may recall that the intended benign purpose of the %n feature of printf family functions is to allow a program to keep track of how many characters printf has written up to a given point, returning the value via a pointer. When this feature is used for an attack, the value that printf interprets as the argument corresponding to the %n format specifier is the "where" of the primitive (the location that will be written to), and the "what" of the primitive, the value that will be written, is the number of characters printf has written so far.

For this vulnerability, the where and the what both have pretty flexible control for an attacker, but they are limited in the size of their values. Because of this limitation, it will not work to try to overwrite a return address on the stack, which might otherwise be your first thought of a location it would be useful for an attacker to replace to take control of the program. Instead we need to rewrite some other data value that the program will use as a control-flow target. If there was a function pointer in the program, it might be a candidate. But what we recommend you use today is instead a value that the program uses for getting to a function in a shared library.

Start by copying the program source code to your working directory, with a command like:

cp /web/classes/Spring-2023/csci4271/labs/11/bclpr.c .

Because this is a simulation of a program that would be installed in a system-wide location if you controlled the full computer, we need to do a bit more to simulate "installing" it so that it will run correctly. As a location that you will have access to but will be unique even on a shared computer, we recommend that you compile and install the program to a location similar to /tmp/bclpr-goldy007, but where goldy007 is replaced with your UMN ID. /tmp is a directory for temporary files that everyone has access to, while using your UMN ID makes it unique. You'll need to make this change both on line 24 of the source that defines the INSTALL_PREFIX macro and in the commands below that create directories the program will use. Note though that our attack is only going to use addresses in the main program, so you don't have to do anything to disable ASLR.

gcc -no-pie -z execstack -g -Wall -Wno-format-security -fno-stack-protector bclpr.c -o bclpr
mkdir -p /tmp/bclpr-goldy007/spool/lp0
mkdir -p /tmp/bclpr-goldy007/printouts/lp0

Auditing suggestions

There are two ingredients that are needed to constitute a format-string vulnerability, and then one other choice you need to make based on the program to set up your attack. Here we'll talk about what you're looking for, and our proposed on a separate page to check your answers of if you want to jump ahead.

  1. A printf-family function with a non-constant format string. The most common way to use printf is for the first argument to be a constant string containing format specifiers starting with a percent sign. But if this string is constant, it can't be affected by an attacker. Only if the format-string argument is a non-constant (usually a variable) can the call be vulnerable.
  2. A way for the attacker the control the contents of the format string. A non-constant format string is potentially dangerous, but it will generally only be of use to an attacker if the attacker can supply their own value for the format string. So for instance if the format string argument is a variable, look back at how that value can be computed to see if it could be under an attacker's control. To make a working attack you'll need to exercise that control.
  3. Another attacker-controlled value that will be interpreted as an argument to the printf-family function. To match up with a %n the attacker puts in a format string, they also need to control another value to be interpreted as the argument to the printf-family function. If the program is already passing other arguments to printf, they might be controllable. Failing that, having a lot of format specifiers in the format string will cause printf to read beyond its internal array on the stack to interpret other values on the stack as arguments. So look for attacker-controllable-values is the stack frame of the calling function, its caller, and so on. On x86-64, the argument corresponding to %n needs to be an 8-byte value that contains several null bytes, so it usually won't work for this value to be a null-terminated string.

Our proposed answers to the above three questions are outlined on a a separate page.

More attack techniques

You can confirm that you are successfully injecting a format string by putting format specifiers in the string and seeing what output results. In particular, try a format string that contains many copies of the format specifier %016lx to print various parameters from the stack as 64-bit hex values.

Before getting to the attack proper, let's take a look at the function-pointer-like shared library mechanism that we will be modifying in our attack. For each function from a shared library that is called by the main program, the compiler creates as small intermediate function called a "PLT stub" (PLT stands for Procedure Linkage Table). The value that this stub function uses to find the actual implementation of the function in the shared library is an entry in the GOT (Global Offset Table). The entries in the GOT used by PLT stubs are sometimes more specifically called the .got.plt section. For the case of this attack, we need to change the entry that corresponds to the location of a function that will be called after the execution of the printf-family function that we control. The library specific function we suggest you use for this purpose is fclose, which you can see is called on line 494, soon after the vulnerable fprintf on line 491.

You can see the GOT entry used, and observe how it works, by looking at the execution of the relevant PLT stub in GDB. Here's a transcript you can try, where you can see that the name of the PLT stub for fclose is 'fclose@plt' (the single quotes are not part of the name proper, but are needed for GDB because normal C function names can't contain @.

% echo 'Hello, world!' >hello.txt
% gdb --args ./bclpr hello.txt
(gdb) disass 'fclose@plt'
Dump of assembler code for function fclose@plt:
   0x00401380 <+0>:	endbr64 
   0x00401384 <+4>:	bnd jmpq *0x3cd5(%rip)   # 0x405060 
   0x0040138b <+11>:	nopl   0x0(%rax,%rax,1)
(gdb) p *(void **)0x405060
$1 = (void *) 0x4010c0
(gdb) watch *(void **)0x405060
Hardware watchpoint 2: *(void **)0x405060
(gdb) run
Starting program: bclpr hello.txt

Hardware watchpoint 1: *(void **)0x405060

Old value = (void *) 0x4010c0
New value = (void *) 0x7ffff7e29e00 <_IO_new_fclose>
0x00007ffff7fe01d0 in _dl_fixup (l=, reloc_arg=)
    at ../sysdeps/x86_64/dl-machine.h:242
(gdb) c
Continuing.
Deleting spoolfile from /tmp/bclpr-goldy007/spool/lp0/hello.txt
[Inferior 1 (process 590385) exited normally]
(gdb) quit

The key information needed for this aspect of the attack is the address of the relevant GOT entry, which you can see from the disassembly is 0x405060. The initial value of this pointer is a placeholder that triggers the dynamic linker to determine the real address the first time the function is executed. That code (specifically the function _dl_fixup where the watchpoint was triggered) is responsible for updating the GOT entry to point to the location of the implementation of fclose in the C library, which is 0x7ffff7e29e00 on this execution.

The "where" in a format-string write-what-where attack needs to be a value on the stack, in a location where fprintf would look for an argument corresponding to a format specifier, which is under the control of the attacker. This is the thing that we will want to set to the address of the GOT entry.

Next, the "what" of our format-string attack needs to be the address of attack_function, and that needs to be the number of bytes that fprintf writes before it gets to the %ln format specifier we've added. You can get the address of the function inside GDB or with the program nm. Note that it will also be useful to know this value in decimal.

% nm bclpr | fgrep attack_function
0000000000402803 T attack_function
% printf '%d\n' 0x402803
4204547

How can we make fprintf output more than four million characters before getting to the crux of our attack? A format string that was itself that long might be too long to pass on the command line. More conveniently we can use the same printf feature we already used in %016lx: a leading zero and then a number in the format specifier causes the length of the output to be padding with leading 0s up to that number of 0s if it is not already that long. 16 is a logical number to use for a 64-bit hex value, but there is no upper limit on this padding size, so it can easily be in the millions.

The final slightly tricky aspect of this attack is making the amount of data output by printf come out to exactly the right amount, since it depends on all the things in the format string up to the point of the %ln. If you followed our suggestion of using %016lx for printing the earlier stack values, than each of them will be 16 bytes long, but don't forget to also count any spaces, newlines, or other separating characters you print. You could also try copying the data out of the log file to count its size. If you get the size wrong, the attack won't work, but you can use the same GDB watchpoint we illustrated earlier to see what value the GOT entry is being overwritten with, and adjust is accordingly.

Another note: the basic printf format specifier for writing the number of bytes output is %n, but for this attack it is important that you use the variant %ln instead. The l causes printf to interpret the pointer as a long * instead of an int *, which is important because we want to overwrite all 8 bytes of the pointer. We saw that the normal value of the location of fclose has the high-bits non-zero.

There is also one more printf feature you may find it useful to experiment with in building your attack, which can make the attacking format string shorter. The version of the printf functions on Unix systems allows you to give format specifiers out of order. If you put a decimal number and then a dollar sign right after the percent sign of the format specifier, that format specifier will take its value from the position corresponding to the number (counting from 1), rather than what would normally come next. For instance while %d will print the next argument as a decimal int, %10$d will print as a decimal int the value that would normally have corresponded to the 10th format specifier had the dollar sign feature not been used.