University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 8

Today's lab will showcase one more vulnerability in the Badly Coded Print Server, walking you through attacking the format string vulnerability we showed in an earlier lecture. We've changed the source code of the vulnerable program slightly, but the idea of the vulnerability is unchanged. We're also using the same endpoint to demonstrate success of your attack: your goal is to transfer control to the attack_function function.

The basic way that a format string vulnerability that allows an attacker to use the %n format specifier works it to provide what we call a write-what-where attack primitive. In other words, the vulnerability gives the attacker the capability to write a value of the attacker's choosing, to a location of the attacker's choosing, sometimes with some other restrictions. You may recall that the intended benign purpose of the %n feature of printf family functions is to allow a program to keep track of how many characters printf has written up to a given point, returning the value via a pointer. When this feature is used for an attack, the value that printf interprets as the argument corresponding to the %n format specifier is the "where" of the primitive (the location that will be written to), and the "what" of the primitive, the value that will be written, is the number of characters printf has written so far.

For this vulnerability, the where and the what both have pretty flexible control for an attacker, but they are limited in the size of their values. Because of this limitation, it will not work to try to overwrite a return address on the stack, which might otherwise be your first thought of a location it would be useful for an attacker to replace to take control of the program. Instead we need to rewrite some other data value that the program will use as a control-flow target. If there was a function pointer in the program, it might be a candidate. But what we recommend you use today is instead a value that the program uses for getting to a function in a shared library.

We suggest you start by copying the latest version of the program source code to your working directory, with a command like:

cp /web/classes/Spring-2022/csci4271/labs/08/bclpr.c .

Because this is a simulation of a program that would be installed in a system-wide location if you controlled the full computer, we need to do a bit more to simulate "installing" it so that it will run correctly. As a location that you will have access to but will be unique even on a shared computer, we recommend that you compile and install the program to a location similar to /tmp/bclpr-goldy007, but where goldy007 is replaced with your UMN ID. /tmp is a directory for temporary files that everyone has access to, while using your UMN ID makes it unique. You'll need to make this change both on line 24 of the source that defines the INSTALL_PREFIX macro and in the commands below that create directories the program will use. Note though that our attack is only going to use addresses in the main program, so you don't have to do anything to disable ASLR.

gcc -no-pie -z execstack -g -Wall -Wno-format-security -fno-stack-protector bclpr.c -o bclpr
mkdir -p /tmp/bclpr-goldy007/spool/lp0
mkdir -p /tmp/bclpr-goldy007/printouts/lp0

Before getting to the attack proper, let's take a look at the function-pointer-like shared library mechanism that we will be modifying in our attack. For each function from a shared library that is called by the main program, the compiler creates as small intermediate function called a "PLT stub" (Procedure Linkage Table). The value that this stub function uses to find the actual implementation of the function in the shared library is an entry in the GOT (Global Offset Table). The entries in the GOT used by PLT stubs are sometimes more specifically called the .got.plt section. For the case of this attack, we need to change the entry that corresponds to the location of a function that will be called after the execution of the printf-family function that we control. The library specific function we suggest you use for this purpose is fclose, which you can see is called on line 494, soon after the vulnerable fprintf on line 491.

You can see the GOT entry used, and observe how it works, by looking at the execution of the relevant PLT stub in GDB. Here's a transcript you can try, where you can see that the name of the PLT stub for fclose is 'fclose@plt' (the single quotes are not part of the name proper, but are needed for GDB because normal C function names can't contain @.

% echo 'Hello, world!' >hello.txt
% gdb --args ./bclpr hello.txt
(gdb) disass 'fclose@plt'
Dump of assembler code for function fclose@plt:
   0x00401380 <+0>:	endbr64 
   0x00401384 <+4>:	bnd jmpq *0x3cd5(%rip)   # 0x405060 
   0x0040138b <+11>:	nopl   0x0(%rax,%rax,1)
(gdb) p *(void **)0x405060
$1 = (void *) 0x4010c0
(gdb) watch *(void **)0x405060
Hardware watchpoint 2: *(void **)0x405060
(gdb) run
Starting program: bclpr hello.txt

Hardware watchpoint 1: *(void **)0x405060

Old value = (void *) 0x4010c0
New value = (void *) 0x7ffff7e29e00 <_IO_new_fclose>
0x00007ffff7fe01d0 in _dl_fixup (l=, reloc_arg=)
    at ../sysdeps/x86_64/dl-machine.h:242
(gdb) c
Continuing.
Deleting spoolfile from /tmp/bclpr-goldy007/spool/lp0/hello.txt
[Inferior 1 (process 590385) exited normally]
(gdb) quit

The key information needed for the attack is the address of the relevant GOT entry, which you can see from the disassembly is 0x405060. The initial value of this pointer is a placeholder that triggers the dynamic linker to determine the real address the first time the function is executed. That code (specifically the function _dl_fixup where the watchpoint was triggered) is responsible for updating the GOT entry to point to the location of the implementation of fclose in the C library, which is 0x7ffff7e29e00 on this execution.

You may recall from looking at this example before that the attacker-controllable printf-family function is the call to fprintf on line 491, whose format string argument message comes from the string supplied to the -m command-line argument. Review how you can supply strings containing format specifiers to this command-line argument, and observe the corresponding results in the BCLPR log file. In particular, try a format string that contains many copies of the format specifier %016lx to print various parameters from the stack as 64-bit hex values.

The "where" in a format-string write-what-where attack needs to be a value on the stack, in a location where fprintf would look for an argument corresponding to a format specifier, which is under the control of the attacker. You'll recall from our previous encounters with this program that the variable header in the main function fits this bill. It contains the first four bytes of the input file being printed, and since it is part of main's stack frame, it is not too far above where fprintf would normally expect its arguments. If you add enough %016lx specifiers, you should see that one of them is printing a 64-bit value where the high four bytes are 0, and the low four bytes are the first four bytes of the input file (because x86-64 is little-endian, the bytes from the file are actually the ones at the lower addresses in memory, and when printed as a number they are backwards from the order they appear as bytes in the file). Keep track of this format specifier, because it's the one you'll need to replace with %ln.

To control this value which is the "where" of our write-what-where attack, we need to create a file whose contents are the address of the thing we want to overwrite, the GOT entry for fclose at 0x405060. You can use any of your favorite ways of creating a file with binary data:

% perl -e 'print "\x60\x50\x40\x00"' >got-addr.in
% hd got-addr.in
00000000  60 50 40 00                                       |`P@.|
00000004

(In fact, because the values 0x60, 0x50, and 0x40 are all printable, in this case you could also make a working file just with echo -n.) You should be able to confirm by using this as the input file and with your stack-dumping format string that the value we're trying to control changes to 0000000000405060.

Next, the "what" of our format-string attack needs to be the address of attack_function, and that needs to be the number of bytes that fprintf writes before it gets to the %ln format specifier we've added. You can get the address of the function inside GDB or with the program nm. Note that it will also be useful to know this value in decimal.

% nm bclpr | fgrep attack_function
0000000000402803 T attack_function
% printf '%d\n' 0x402803
4204547

How can we make fprintf output more than four million characters before getting to the crux of our attack? A format string that was itself that long might be too long to pass on the command line. More conveniently we can use the same printf feature we already used in %016lx: a leading zero and then a number in the format specifier causes the length of the output to be padding with leading 0s up to that number of 0s if it is not already that long. 16 is a logical number to use for a 64-bit hex value, but there is no upper limit on this padding size, so it can easily be in the millions.

The final slightly tricky aspect of this attack is making the amount of data output by printf come out to exactly the right amount, since it depends on all the things in the format string up to the point of the %ln. If you followed our suggestion of using %016lx for printing the earlier stack values, than each of them will be 16 bytes long, but don't forget to also count any spaces, newlines, or other separating characters you print. You could also try copying the data out of the log file to count its size. If you get the size wrong, the attack won't work, but you can use the same GDB watchpoint we illustrated earlier to see what value the GOT entry is being overwritten with, and adjust is accordingly.

Another small note: the basic printf format specifier for writing the number of bytes output is %n, but for this attack it is important that you use the variant %ln instead. The l causes printf to interpret the pointer as a long * instead of an int *, which is important because we want to overwrite all 8 bytes of the pointer. We saw that the normal value of the location of fclose has the high-bits non-zero.

There is also one more printf feature you may find it useful to experiment with in building your attack, which can make the attacking format string shorter. The version of the printf functions on Unix systems allows you to give format specifiers out of order. If you put a decimal number and then a dollar sign right after the percent sign of the format specifier, that format specifier will take its value from the position corresponding to the number (counting from 1), rather than what would normally come next. For instance while %d will print the next argument as a decimal int, %10$d will print as a decimal int the value that would normally have corresponded to the 10th format specifier had the dollar sign feature not been used.