University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 3

Today's lab follows up on more recent lecture topics related to memory safety vulnerabilities and understanding low-level program behavior.

For the in-person lab, we recommend that you work on this lab in groups of 2-3 students sitting near you in the lab, though this is not required.

GDB is likely to be useful again this week. Again the slides I used to introduce its key features in 2021 are here. You can also find the whole GDB manual on the web, or use the help command while it's running.

We don't expect that every student will be able to finish every part of the lab within the 50 minute lab section: we try to include enough material to keep you busy even if you're pretty fast. Today's lab is probably a bit shorter than last week's though, so you might consider going back to the later questions from last week after these if you didn't get to them last week. We'd also recommend you look into the left-over parts of labs on your own afterwards, and bring remaining questions to Piazza or office hours.

  1. (Reverse overflow.)

    It would be hard to compile a program where the entire stack was used backwards, so for this question we've simulated the same reversal of direction we talked about in class by making a buffer overflow in which the direction of the overflow is reversed to go from higher to lower addresses. The program reverse-overflow.c has the same type of buffer overflow problem in a function named func that we had discussed in class, but the function revcpy that does the copying writes into the destination buffer in the backwards direction. We recommend you copy the source code and the binary we've already compiled for you into your working directory:

    cp /web/classes/Fall-2023/csci4271/labs/03/reverse-overflow{.c,} .
    

    This program should also look familiar compared to last week's forward overflow. Again your goal is to figure out how long of a string you need to provide (as a command-line input to the program) to overwrite a return address with a value of your own choosing. Since it's mostly the length that's important, you can use just normal printable characters on the command line. To be clear that you're seeing which part of your input is overwriting the return address, use the fact that 0x4271 in hex is the same as Bq in ASCII to get the program to print a message that looks like (the question marks will be the name of a function):

    ?????? return address corrupted to 0x4271427142714271
    

    As with last week, there are several ways you can figure this out:

    • In the debugger. You can use GDB to print the addresses of things inside the program, and the subtract the addresses to figure out how far apart they are. You'll notice you have to do the subtraction in a slightly different way than last week, though.
    • Trial and error. Gradually make the program input longer until interesting things start happening. Use characters in your input that have different ASCII codes so you can see which ones (if any) are participating in the overwrite. As with last week, this is probably the fastest way to get to a working solution, but it probably gives you less insight as to what's really happening.
    • By looking at the disassembled code directly. This is also different from last week because you have to think about different stack contents, though in some ways there are fewer objects to worry about.
  2. (Integer overflow to buffer overflow.)

    In lecture we introduced integer overflow, which is a kind of sometimes-surprising behavior of C code that isn't directly a memory safety problem. Integer overflow comes from the limited number of bits used to store each of C's integer types. These types can only represent a limited range of values, so if the result of a computation would be too larger or too negative to be represented, you get a different value instead. You might recall from CSci 2021 that the result of an overflow is typically just the low bits of the correct result, similar to the result of taking a remainder.

    Integer overflow can, however, sometimes be the first step leading to a memory safety problem later. For instance an integer overflow can lead to a buffer overflow if a program is confused into allocating too little space for the amount of data to be written. For this question will look at an example of one such vulnerability. As usual you can copy the program source and binary to your working directory:

    cp /web/classes/Fall-2023/csci4271/labs/03/int-to-buf-oflow{.c,} .
    

    This program is similar to one we looked at in class but has some small differences you'll need to take account of. The program takes both a command-line argument and input via the standard input. The command-line argument is supposed to represent a number of objects to read. The program will allocate memory with malloc and then read that many objects worth of data from the standard input into the allocated object. In a more realistic program it would probably go on to do something else useful with the objects, but for this example we stop after the reading because that's where the problem is.

    First take a look at the program's source code to see whether you can understand why there is a problem from that perspective. Trace through what the program does with the command-line argument as it is converted to binary, passed to the read_objs function, and then used to control both the size of the object allocated by malloc and the number of objects' worth of data the program reads. What are the possible ranges of values for the different types of integer variables used for these operations? Which operations can overflow? Then more specifically, what scenario would lead to the memory region pointed to by objs being too small for the data written to it?

    Then, confirm or modify your theories based on experimenting with running the program. To supply an unlimited amount of input to the standard input of the program, we recommend that you start with the Unix program yes, which produces an infinite stream of characters to its standard output. (If you're curious about why this program has the name it does and what it was originally intended to be used for, you might start by reading the manual page and follow up with Wikipedia. But that's not important for our use of it here.) Here's an example of how a normal usage of the program might go, where we ask to read 10 objects (% represents the prompt, and the rest is the program's output:

    % yes | ./int-to-buf-oflow 10
    "10" read as 10 (0x000000000000000a) converted to 10 (0x0000000a) success
    Size (after mult.) is 400 (0x00000190)
    Read 10 objects
    

    Based on what you figured out earlier, you should be able to find a different command line option to the program that causes it to crash with a segmentation fault, which is a sign of writing beyond the end of the allocated object. You may find it useful to think about and to supply the command line argument in hexadecimal, with a 0x prefix. In fact, there should be a whole range of command line arguments, some of which cause the program to crash almost immediately, and some of which might make it run for a second or two before crashing. Can you explain the exact range of arguments that lead to an overflow? For instance, what's the smallest number n such that the program will run successfully with n as the input, but crash with n + 1?

    Bonus question. You'll notice on lines 32-34 of the program that a certain value in the object being read will cause the program to stop reading. This doesn't affect the crashing experiments we did earlier. But suppose you were an attacker who wanted to use this buffer overflow as part of an attack; why would the presence of this check be important to you? To make this a concrete challenge, try and see if you can find inputs to the program that make it print:

    Last object canary corrupted to 0x42714271
    

    For this you'll have to control both the number and the contents on standard input. So rather than using yes, you'll need to provide specific values for the standard input, either with a different program like echo or printf, or by putting the contents in a file.