Development of Secure Software Systems

CSci 4271 Lab 3

Today's lab follows up on the recent lecture topics related to memory safety attacks, with continued emphasis on understanding low-level program behavior.

In the online lab we'll randomly split you into breakout groups of 2-3 students: please work together, discuss, and learn from the other student(s) in you group. Use the "Ask for Help" button to ask questions or show off what you've done.

GDB will again be a useful too. The slides I used to introduce its key features are here. You can also find the whole GDB manual on the web, or use the help command while it's running.

(Buffer overflow to shellcode.)
Now that you've seen several examples of buffer overflows, it's time to put on your black hat and explore some techniques that attackers use to exploit them. To focus on the basic principles, today's lab is going to simplify some of the issues, by using a very simple buffer overflow and disabling defenses. You can copy the vulnerable program for this question into your working directory with this command:
```
cp /web/classes/Spring-2022/csci4271/labs/03/overflow-from-file{.c,} .
```
Doing your attacks against the binary we've already compiled for you helps keep things working predictably, but for your background knowledge here's the command we used to compile this binary:
```
gcc -no-pie -z execstack -g -Wall -fno-stack-protector overflow-from-file.c -o overflow-from-file
```
The GCC options -no-pie, -z execstack, and -fno-stack-protector are all disabling defensive mechanisms. We'll talk more about these mechanisms in future classes, but disabling them allows simpler versions of attacks to work. If you look at the source code for this program you'll see that the other simple thing about it is the vulnerability. The function read_and_print has a fixed-size buffer buf, and it copies the entire contents of the file specified on the command line into that buffer with the read system call, with no checking of the size at all.

There's one other defensive feature you should disable for the purposes of this lab, called ASLR or address-space layout randomization. Linux systems usually have this turned on by default, but you can disable it for certain programs if you want. The most convenient way to do that is to disable it for a shell, and then it will also be disabled for other programs started in that shell. For instance if you like the bash shell (it's the default for CSE Labs accounts), you can say:
```
setarch -R bash
```
Watch out that it can be confusing what level of shell you're in, between the new shell we're creating and the ones the shellcode might run. You can check whether ASLR has been successfully disabled for the stack of newly-executed programs with the following command:
```
for i in `seq 1 10`; do fgrep stack /proc/self/maps; done
```
If ASLR is enabled, each program executed in the loop will have a different stack location. If ASLR is disabled, the stack locations will all be the same.

Your goal for this problem is to carry out a complete attack including getting the vulnerable program to run shellcode and from that a shell or other program of your choosing. Since the program doesn't have any special privilege as you're testing it, this won't let you run any commands you couldn't otherwise, but you can imagine you're supplying an attack that an unwitting victim user would run. The program reads its input from a file, so the first part of the attack to understand is creating the contents of that file. You'll need some binary data for the attack, so this will also require working with a binary file which is a little different that editing a file that contains only regular text.

There are interactive programs named hex editors that you can use to edit binary files; they're called hex editors because they usually default to printing each byte in hexadecimal. For instance the CSE Labs machines have a program named ghex installed that you could use for this purpose if you'd like. However these instructions will give you more detailed suggestions instead for a more command-line approach. You can create the contents of binary files using shell commands, and then dump the contents of the files with the program hd to confirm that you made the file you wanted. There are also a variety of different shell commands you can use to write file contents; for instance the shell command printf formats data using most of the same conventions as the C function of the same name. But what these instructions will walk through is Perl, a scripting programming language that mixes together some shell and C-like features, and works well for short command-line programs.

Before we start with an actual attack, let's try creating a benign (non-attack) file with a controlled length. The x86-64 stack mostly works in units is 8 bytes, so it's convenient to make input files with that size unit too. The vulnerable buffer in this program is 20 bytes long, so a 16-byte file won't be a security problem. Here's an example Perl command you can use to print 16 bytes of text:
```
perl -e 'print "A" x 8, "B" x 8'
```
The Perl code inside the single quotes is the argument to the -e option that gives a short script on the command line. You can probably guess what Perl's print operation does. The x operator is used to repeat a string, so "A" x 8 is equivalent to "AAAAAAAA". Obviously this will be a more important abbreviation if you want to use a larger repeat count. Perl's print, like C's printf, doesn't automatically put a newline at the end of the output. That may make the output of this command overlap with your prompt when you print it on the screen, but it's what you want for binary data because a newline doesn't have a special meaning in a binary file: it's just a byte with the value 0x0a. To pass this data to the program, we'll need to put it in a file instead, which you can do with the shell redirection operator like this:
```
perl -e 'print "A" x 8, "B" x 8' >16.txt
```
Try running hd on the newly created 16.txt file to see that its contents look right. Remember that the ASCII codes for uppercase letters start at hex 0x41. You can also try giving it to the ./overflow-from-file program, but because the data fits in the buffer, nothing very exciting should happen. Now, repeat the process with some longer inputs, following the same pattern. For some longer inputs you should see that the program crashes, but crashing isn't proof of a return address getting replaced, because overwriting other parts of the program could also cause it to crash for other reasons. You can figure out how long the string needs to be to overwrite the return address in a similar way as in last week's lab, either by trial and error or by comparing addresses in GDB. If you have gotten the program to crash, you can double-check that the problem is an overwritten return address by running the crashing program under GDB. GDB will stop at the point where the program would have crashed. You should be able to see with x/i $rip that it is at a return instruction, and info frame or x/g $rsp should show the overwritten value it's about to return to.

The next step for the attacker to take control is to write the shellcode that we want the victim program to execute. You can't just use a normal compiler because you only want the bytes for a few instructions, not a complete executable. But for many kinds of shellcode you can test them by wrapping them in a complete executable, and then just extract the bytes you want. For today's lab we've given you a sample of shellcode in assembly that you can copy and compile like this:
```
cp /web/classes/Spring-2022/csci4271/labs/03/shellcode.S .
gcc -nostdlib shellcode.S -o shellcode-test
```
We've put comments in the assembly code to walk you through what the shellcode is doing in constructing the data needed by the execve system call. But you won't need to go too deep into that for today's lab. One other thing you might notice about the shellcode is that though it uses the number zero for various purposes, none of the instructions has an immediate zero value as an operand, because that would lead to an undesirable zero byte in the instruction encoding. You can run the shellcode-test program to see that it starts a shell as expected. But to get the instruction bytes we want, what you should do is to disassemble the binary:
```
objdump -d shellcode-test
```
The instruction bytes are the middle column of the output, starting with 31 c0. x86 has variable-length instructions, and this code was optimized for length to use short instructions.

One way to put the instruction bytes into a Perl program is to use the \x escape sequence inside a double-quoted string, which works the same way as the same escape in C. But if you have a lot of hex bytes, it's a bit more convenient to use Perl's pack function, which converts data into a binary format using a format string a little bit like printf. The pack format "C*" processes any number of inputs into unsigned 8-bit characters. So the following two commands give the same output, as you can check with hd:
```
perl -e 'print "\x31\xc0"' | hd
perl -e 'print pack("C*", map(hex($_), qw(31 c0)))' | hd
```
Another kind of Perl packing that can save you a little bit of work is "Q", which makes a 64-bit integer or pointer value. Though if you already have it in hex, the only work it's really saving is reversing the order of the bytes to be little-endian:
```
perl -e 'print "\x90\xef\xcd\xab\x78\x56\x34\x12"' | hd
perl -e 'print pack("Q", 0x12345678abcdef90)' | hd
```
Our recommendation for where to put the shellcode in memory is in an environment variable, since that lets you also include a NOP sled without worrying about the size of data inside the program. For instance you can easily make your shellcode 1000 bytes long if you'd like. The bash shell command to set an environment variable to the output of a program looks like:
```
export SHELLCODE=$(...)
```
Where SHELLCODE is the name of the environment variable, and the ... should be replaced with a command, like one of the perl commands we've been demonstrating above. (But without redirecting to a file or piping to hd, of course.)

After you're supplying the shellcode in an environment variable, you'll need to base the address of what you're overwriting the return address with on the location of the shellcode. The shellcode and other environment variables will be at slightly higher addresses than the local variables on the stack you may have already looked at in GDB. If you've got a big NOP sled, you may be able to just use a trial and error process, but you can also use the command p getenv("SHELLCODE") when running under GDB to tell you the location of an environment variable on that execution. The layout of the stack differs a little bit inside versus outside of GDB, but it's usually a small variation that can be handled by the NOP sled. By the way, to handle small variations in both directions, you should target your jump at the middle of the NOP sled.

The paragraphs above should lay out the main pieces you need to put together to make your attack work. However there are a lot of little details that have to work correctly together, so you shouldn't be surprised if your attack doesn't work as expected the first time you try it. This is where GDB comes into the process again: use it to look at what it happening in various stages of execution from the normal execution of the code, to overwriting the return address, jumping to the shellcode, and executing through the shellcode.

Extra complexity 1: try changing the shellcode to execute a different program. /bin/ls is a good first thing to try since it's the same number of characters. A program with a longer path (xcalc would also be traditional) may need a slightly larger change. Because of the way the program path is encoded, you'll also need to do some arithmetic, which can be done with GDB or Perl, among other possibilities.

Extra complexity 2: change your attack so that instead of putting the shellcode in an environment variable, the shellcode goes in the same file that overwrites the buffer. You'll need to be more exact about the address of that buffer to make this work.
(Side effects of a buffer overflow.)
The second part of the lab extends the same basic setup as the first part, but we've made the vulnerable program more complex in a way that makes an attack more difficult. You can get a copy in the same way as before:
```
cp /web/classes/Spring-2022/csci4271/labs/03/overflow-from-file-2{.c,} .
```
If you look at what has been added to the read_and_print function, there is some additional checking code between the read that causes the overflow and the end of the function. If you're not clear why this is a problem, try extending your attack from the previous part to work for this program. The shellcode can be exactly the same, but the stack layout of the read_and_print function is a bit different. Go ahead and do that now before reading the next paragraph.

The problem is that the local variable b is stored in between the buffer and the return address in the stack frame. If you overwrite b with an arbitrary value, the checks that the program does on the value will fail, and the program will exit without ever getting to your overwritten return address. Adjust your attack so that works again. The easiest way to do this is to ``overwrite'' b with exactly the same value it was supposed to have in the first place. You may find it useful to use the program nm to find the address of a symbol in a binary. For instance this command:
```
nm overflow-from-file-2 | fgrep read_and_print
```
will find the address of the read_and_print function.

Extra complexity: in a more complicated scenario, it might be hard for an attacker to predict the old value of b. Since b is a pointer, another choice would be to make it point to another area of your choosing, and then put data there that would pass the program's checks.