Development of Secure Software Systems

CSci 4271 Lab 2

Today's lab follows up on more recent lecture topics related to memory safety vulnerabilities and understanding low-level program behavior.

For the in-person lab, we recommend that you work on this lab in groups of 2-3 students sitting near you in the lab, though this is not required. (Depending on attendance, there may not be enough computers for every student, but there should be enough chairs for everyone.)

For the online version of the we'll randomly split you into breakout groups of 2-4 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done. It would be possible to do this lab on your own Linux machine, but to make things most predictable, we recommend that you do it by accessing a CSE Labs Linux machine over SSH. There are some more suggestions about remote access at the end of this page.

GDB is likely to be useful again this week. Again the slides I used to introduce its key features in 2021 are here. You can also find the whole GDB manual on the web, or use the help command while it's running.

(Reverse overflow.)
It would be hard to compile a program where the entire stack was used backwards, so for this question we've simulated the same reversal of direction we talked about in class by making a buffer overflow in which the direction of the overflow is reversed to go from higher to lower addresses. The program reverse-overflow.c has the same type of buffer overflow problem in a function named func that we had discussed in class, but the function revcpy that does the copying writes into the destination buffer in the backwards direction. You may find it easier if you copy the source code and the binary we've already compiled for you into your working directory:
```
cp /web/classes/Spring-2022/csci4271/labs/02/reverse-overflow{.c,} .
```
This program should also look familiar compared to last week's forward overflow. Again your goal is to figure out how long of a string you need to provide (as a command-line input to the program) to overwrite a return address with a value of your own choosing. Since it's mostly the length that's important, you can use just normal printable characters on the command line. To be clear that you're seeing which part of your input is overwriting the return address, use the fact that 0x4271 in hex is the same as Bq in ASCII to get the program to print a message that ends with:
```
return address corrupted to 0x4271427142714271
```
We have two recommendations about how you might figure this out:
- Trial and error. Gradually make the program input longer until interesting things start happening. Use characters in your input that have different ASCII codes so you can see which ones (if any) are participating in the overwrite.
- In the debugger. You can use GDB to print the addresses of things inside the program, and the subtract the addresses to figure out how far apart they are. You'll notice you have to do the subtraction in a slightly different way than last week, though.
(Integer overflow to buffer overflow.)
In lecture we briefly introduced integer overflow, which is a kind of sometimes-surprising behavior of C code that isn't directly a memory safety problem. Integer overflow comes from the limited number of bits used to store each of C's integer types. These types can only represent a limited range of values, so if the result of a computation would be too larger or too negative to be represented, you get a different value instead. You might recall from CSci 2021 that the result of an overflow is typically just the low bits of the correct result, similar to the result of taking a remainder.

Integer overflow can, however, sometimes be the first step leading to a memory safety problem later. For instance an integer overflow can lead to a buffer overflow if a program is confused into allocating too little space for the amount of data to be written. For this question will look at an example of one such vulnerability. As usual you can copy the program source and binary to your working directory:
```
cp /web/classes/Spring-2022/csci4271/labs/02/overflow-eg{.c,} .
```
This program takes both a command-line argument and input via the standard input. The command-line argument is supposed to represent a number of objects to read. The program will allocate memory with malloc and then read that many objects worth of data from the standard input into the allocated object. In a more realistic program it would probably go on to do something else useful with the objects, but for this example we stop after the reading because that's where the problem is.

First take a look at the program's source code to see whether you can understand why there is a problem from that perspective. Trace through what the program does with the command-line argument as it is converted to binary, passed to the read_objs function, and then used to control both the size of the object allocated by malloc and the number of objects' worth of data the program reads. What are the possible ranges of values for the different types of integer variables used for these operations? Which operations can overflow? Then more specifically, what scenario would lead to the memory region pointed to by objs being too small for the data written to it?

Then, confirm or modify your theories based on experimenting with running the program. To supply an unlimited amount of input to the standard input of the program, we recommend that you use the Unix program yes, which produces an infinite stream of characters to its standard output. (If you're curious about why this program has the name it does and what it was originally intended to be used for, you might start by reading the manual page and follow up with Wikipedia. But that's not important for our use of it here.) Here's an example of how a normal usage of the program might go, where we ask to read 10 objects (% represents the prompt, and the rest is the program's output:
```
% yes | ./overflow-eg 10
"10" read as 10 (0x000000000000000a) converted to 10 (0x0000000a) success
Size (after mult.) is 240 (0x000000f0)
Read 10 objects
```
Based on what you figured out earlier, you should be able to find a different command line option to the program that causes it to crash with a segmentation fault, which is a sign of writing beyond the end of the allocated object. You may find it useful to think about and to supply the command line argument in hexadecimal, with a 0x prefix. In fact, there should be a whole range of command line arguments, some of which cause the program to crash almost immediately, and some of which might make it run for a second or two before crashing. Can you explain the exact range of arguments that lead to an overflow?

Bonus question. You'll notice on lines 24-25 of the program that a certain value of the object being read will cause the program to stop reading. This doesn't affect the crashing experiments we did earlier. But suppose you were an attacker who wanted to use this buffer overflow as part of an attack; why would the presence of this check be important to you?