Development of Secure Software Systems

CSci 4271 Lab 1

Today's lab follows up on the recent lecture topics related to memory safety vulnerabilities and understanding low-level program behavior.

For the in-person lab, we recommend that you work on this lab in groups of 2-3 students sitting near you in the lab, though this is not required. (Depending on attendance, there may not be enough computers for every student, but there should be enough chairs for everyone.)

For the online version of the we'll randomly split you into breakout groups of 2-4 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done. It would be possible to do this lab on your own Linux machine, but to make things most predictable, we recommend that you do it by accessing a CSE Labs Linux machine over SSH. There are some more suggestions about remote access at the end of this page.

A useful tool for the first question and the major focus of the other questions is the debugger GDB. This is a good time to review features of it you might already have been exposed to in 2021. If you'd like a refresher, the slides I used to introduce its key features in 2021 are here. You can also find the whole GDB manual on the web, or use the help command while it's running.

(Standard buffer overflow.)
For this question you'll experiment with a program with the same kind of classic stack buffer overflow problem we talked about in class on Thursday. For things to work out right we ask that you use a version that we have compiled, but you can also look at the source code if you'd like. The program forward-overflow.c has the same type of buffer overflow problem in a function named func that we had discussed in class. We suggest you copy the source code and the binary we've already compiled for you into your working directory:
```
cp /web/classes/Spring-2022/csci4271/labs/01/forward-overflow{.c,} .
```
(Notice the space and final dot at the end of that command, which represents that the current directory is the destination of the copy.) One thing we've added to this program is explicit checks to see whether the return addresses are getting overwritten (these are similar to a kind of defense the compiler can also add automatically, but which we've disabled). This means that rather than just segfaulting, you'll see an explanation of exactly what your attack achieved. As with the palindrome attack in class, your goal is to figure out how long of a string you need to provide (as a command-line input to the program) to overwrite a return address with a value of your own choosing. Since it's mostly the length that's important, you can use just normal printable characters on the command line. To be clear that you're seeing which part of your input is overwriting the return address, use the fact that 0x42 and 0x71 in hex are the same as B and q in ASCII to get the program to print a message that ends with:
```
return address corrupted to 0x4271427142714271
```
We have two recommendations about how you might figure this out:
- Trial and error. Gradually make the program input longer until interesting things start happening. Use characters in your input that have different ASCII codes so you can see which ones (if any) are participating in the overwrite.
- In the debugger. You can use GDB to print the addresses of things inside the program, and the subtract the addresses to figure out how far apart they are.
Extra complexity: when using the trial and error approach, you might have noticed sometimes seeing messages about the return address of func getting corrupted, but not with the value from your input. How is that happening? It turns out this is related to something else about how the code manages the stack. You may be able to see what's happening if you step through the code instruction by instruction in the debugger. This kind of effect is also a useful capability for an attacker.
(Debugging program internals.)
One basic thing that a debugger is good for is examining data internal to a program. That's the way we'll ask you to use GDB for this question, which might remind you of the CSci 2021 "Bomb Lab" if you had that in your 2021.

The program int-seq reads a sequence of integers from the standard input and stores them in a buffer. (The numbers are written as 8 hex digits each, and separated by spaces.) There is no limit to the length of the sequence, but the buffer that holds it is only declared to hold 4 integers, so there is clearly a buffer overflow problem. However, the program is expecting a particular sequence of numbers, and it stops reading if one of the numbers is not the one it is expecting. Your goal is to trigger the buffer overflow, but to do1 this you need to supply the right sequence of integers.
We have given you the source code for the main program and a compiled binary, which you can copy like this:
```
cp /web/classes/Spring-2022/csci4271/labs/01/int-seq{.c,} .
```
However, the key to generating the sequence is a function named weird_func. We haven't given you the source code for this function, and it would be complicated to understand even if you had the source code. So rather than understanding this function, you should just let the program itself run the function, and look at the value that it produces. The program's error message doesn't print the value, but you can get at it by running the program under the debugger.

For this compiled program we have left in the stack buffer overflow detection code that GCC produces by default, so you can know you have succeeded in your buffer overflow if you get the program to print the following message:
```
*** stack smashing detected ***: terminated
```
Even though the buffer only holds 4 integers, you might be surprised how many integers you need to read before overwriting the return address (or you might not be surprised if you look at the stack frame layout). Here are a couple of hints that can make the process more efficient. GDB's command display will tell it to automatically print something every time the program stops. By using the debugger to modify the contents of variables, you can make the program continue running instead of stopping, so you don't have to rerun it as many times for experiments. (But of course you still want to find any input that will make the program crash without modifying it.)
(Binary-level debugging.)
GDB can be used as either a source-code level or a binary-code level debugger. Debugging at the source-code level is usually easier when it's possible, but if you're seeing security problems in someone else's binary that you don't have source code for, or if you're understanding the details of attacks, looking at execution at the binary-code level is sometimes also needed. For this question, we've written a program that computes Fibonacci numbers, but it does so in an unusual way. Instead of just computing the number directly, it creates and executes a machine-code program to do the computation. This approach is called just-in-time compilation, and a more complex version of it is how Java and JavaScript are usually implemented. For Fibonacci it's a bit overkill, but we've chosen it as an example because source-level debugging won't work for the generated code. You can copy and compile the code with commands like:
```
cp /web/classes/Spring-2022/csci4271/labs/01/jit-fib.c .
gcc -Wall -g jit-fib.c -o jit-fib
```
This program mostly runs, but it has a bug that causes it to sometimes compute the wrong results. For instance fib(6) should be 8, but this program prints 7 instead. You task for this question is to use GDB to look at the generated code and what it's doing to find where the bug is in the program. You'll want to have GDB disassemble the machine code using the disassemble command (disas for short), but you'll need to supply the right range of addresses and do it at the right time after the code has been written. If you can't see what's wrong with the code, you can also step through how it's working instruction by instruction with stepi (si) and print the contents of the registers.

Extra complexity 1: this program has to change the permissions on the memory region it uses for the generated code, because the memory permissions on Linux are usually set up with what's called a W xor X policy: some memory regions are writeable, and others are executable, but no region is writable and executable at the same time. W xor X is a security measure because regions that are simultaneously writable and executable can make it easier for an attacker to inject their own code. If we were more concerned about this security aspect for this program, how could we manage the permissions differently?

Extra complexity 2: would it be possible to simplify the algorithm used by the generated code so it uses only two registers instead of three?

Appendix: remote access

(Vole and SSH access.)
You've probably already used Vole in previous CS classes, since it's usually the first recommended way to access CSE Labs computing resources remotely. But just in case you haven't, it's a remote-desktop login system which displays a Linux-based graphical interface inside a window in your web browser. The starting page with basic information and links about Vole is https://vole.cse.umn.edu/

The graphical desktop features of Vole are going to be less important for a lot of the work we do, so it's also good to know about how you can make a terminal-only connection using SSH. Unix and Mac computers likely already have a command-line "ssh" program; if you're using Windows you may need to install a separate program. CSE-IT's suggestions are here.

With SSH there's also a choice of what CSE Labs computer to connect to. There isn't a single large entry point maintained for SSH in the same way the Vole servers work. So our recommendation instead is that you randomly pick one of the Linux workstations in the lab where our in-person labs are held. There's a predictable naming pattern; since the lab is 1-262 Keller Hall, the names of the machines all look like:
```
csel-kh1262-XX.cselabs.umn.edu
        
```
where the XX is replaced by a two digit number between 02 and 28. You should choose a computer number randomly to spread the load out across all the machines, like with the command:
```
perl -e 'printf "%02d\n", 2 + int(rand(27))'
```
As a combination of the two ideas mentioned above, at least while the campus is running most classes remotely, CSE-IT has also arranged that you can make graphical connections to the lab machines in the same way you connect to Vole. Just put the host name mentioned in the previous paragraph into your web browser.
(Screen sharing on Zoom.)
Since you're using Zoom anyway to participate in the lab, one basic kind of collaboration you can do is to share the view of your screen or a single window in a Zoom meeting. This is the same feature we use to present slides. If you're using the desktop Zoom application, there's a green "share screen" icon at the bottom of the window.
(Terminal sharing with tmate.)
Zoom screen sharing is view-only: you can see what someone else is doing but not do things for them. Sometimes that can be a good way of working; in pair programming terminology you'll hear people talk about one person "driving" at a time. But in other cases it can be convenient for multiple people to all be able to interact with programs at once. For command-line/terminal programs, the easiest solution we've found for doing this is a program called tmate, which works using a combination of tmux and SSH and public rendezvous servers. The web site is at https://tmate.io/ and we've also already installed a binary on the CSE Labs machines at:
```
/web/classes/Spring-2022/csci4271/bin/tmate
```
Set up a shared terminal session on one of the CSE Labs machines with the other people in your group an run some commands together.
(Accessing library materials.)
Some of the reading materials in this course will be ones that you'd have to pay to get access to as an independent person, but which are available without charge to the university community because the U Libraries have already paid for that access. If you're browsing from an on-campus computer (e.g., Vole), the free access will usually just work automatically, but not if you're off the campus network.

A general purpose tool you can use to make your network connection from off campus look like an on-campus one is a VPN. Information about the campus VPN is available here. But the VPN can be a bit complicated to install and use. So we recommend a more specialized service instead for this purpose.

This service is a proxy that the libraries run specifically just for accessing library resources over the web. You switch to using the proxy by putting the following prefix in front of the URL you want to access:
```
http://login.ezproxy.lib.umn.edu/login?url=
        
```
That's a bit of a pain to type every time, though. In many web browsers you can set this up as what's called a bookmarklet, so that transforming the URL is as easy as choosing a menu entry. The libraries' information about how to do this is here.

Try one of the methods for accessing libraries content out by downloading a PDF for this paper. (This paper isn't an assigned reading, but it is relevant to security.)