University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 1

Today's lab follows up on the recent lecture topics related to memory safety vulnerabilities and understanding low-level program behavior.

For today's lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done.

A useful tool for the first question and the major focus of the second question is the debugger GDB. This is a good time to review features of it you might already have been exposed to in 2021. If you'd like a refresher, the slides I used to introduce its key features in 2021 are here. You can also find the whole GDB manual on the web, or use the help command while it's running.

  1. (Reverse overflow.)

    It would be hard to compile a program where the entire stack was used backwards, so for this question we've simulated the same reversal of direction we talked about in class by making a buffer overflow in which the direction of the overflow is reversed to go from higher to lower addresses. The program reverse-overflow.c has the same type of buffer overflow problem in a function named func that we had discussed in class, but the function revcpy that does the copying writes into the destination buffer in the backwards direction. You may find it easier if you copy the source code and the binary we've already compiled for you into your working directory:

    cp /web/classes/Spring-2021/csci4271/labs/01/reverse-overflow{.c,} .
    

    (Notice the final dot at the end of that command, which represents the current directory is the destination of the copy.) The other thing we've added to this program is explicit checks to see whether the return addresses are getting overwritten (these are similar to a kind of defense the compiler can also add automatically). This means that rather than just segfaulting, you'll see an explanation of exactly what your attack achieved. As we did for the palindrome attack in class, your goal is to figure out how long of a string you need to provide (as a command-line input to the program) to overwrite a return address with a value of your own choosing. Since it's mostly the length that's important, you can use just normal printable characters on the command line. To be clear that you're seeing which part of your input is overwriting the return address, use the fact that 0x4271 in hex is the same as Bq in ASCII to get the program to print a message that ends with:

    return address corrupted to 0x4271427142714271
    

    We have two recommendations about how you might figure this out:

    • Trial and error. Gradually make the program input longer until interesting things start happening. Use characters in your input that have different ASCII codes so you can see which ones (if any) are participating in the overwrite.
    • In the debugger. You can use GDB to print the addresses of things inside the program, and the subtract the addresses to figure out how far apart they are.

    Extra complexity: when using the trial and error approach, you might have noticed sometimes seeing messages about the return address of func getting corrupted, but not with the value from your input. How is that happening, given what we discussed about the direction of the overwrite? It turns out this is related to something else about how the code manages the stack. You may be able to see what's happening if you step through the code instruction by instruction in the debugger. This kind of effect is also a useful capability for an attacker.

  2. (Binary-level debugging.)

    GDB can be used as either a source-code level or a binary-code level debugger. Debugging at the source-code level is usually easier when it's possible, but if you're seeing security problems in someone else's binary that you don't have source code for, or if you're understanding the details of attacks, looking at execution at the binary-code level is sometimes also needed. For this question, we've written a program that computes Fibonacci numbers, but it does so in an unusual way. Instead of just computing the number directly, it creates and executes a machine-code program to do the computation. This approach is called just-in-time compilation, and a more complex version of it is how Java and JavaScript are usually implemented. For Fibonacci it's a bit overkill, but we've chosen it as an example because source-level debugging won't work for the generated code. You can copy and compile the code with commands like:

    cp /web/classes/Spring-2021/csci4271/labs/01/jit-fib.c .
    gcc -Wall -g jit-fib.c -o jit-fib
    

    This program mostly runs, but it has a bug that causes it to sometimes compute the wrong results. For instance fib(6) should be 8, but this program prints 7 instead. You task for this question is to use GDB to look at the generated code and what it's doing to find where the bug is in the program. You'll want to have GDB disassemble the machine code using the disassemble command (disas for short), but you'll need to supply the right range of addresses and do it at the right time after the code has been written. If you can't see what's wrong with the code, you can also step through how it's working instruction by instruction with stepi (si) and print the contents of the registers.

    Extra complexity 1: this program has to change the permissions on the memory region it uses for the generated code, because the memory permissions on Linux are usually set up with what's called a W xor X policy: some memory regions are writeable, and others are executable, but no region is writable and executable at the same time. W xor X is a security measure because regions that are simultaneously writable and executable can make it easier for an attacker to inject their own code. If we were more concerned about this security aspect for this program, how could we manage the permissions differently?

    Extra complexity 2: would it be possible to simplify the algorithm used by the generated code so it uses only two registers instead of three?