University of Minnesota
Development of Secure Software Systems
index.php

CSci 4271 Lab 7

Today's lab will mix together two topics. We'll give you hands on experience with Unix file permissions, continuing the abstract discussion from lecture. And we'll also walk though an example of attacking a program with a format string vulnerability. We've interleaved the instructions for these two parts of the lab because we with a good way to make use of your time in the lab is to do some exploration of both areas, but you can feel free to prioritize in whatever way you think would be best for you.

(Permissions) Basic tools

For the first part of the lab, we'd like you to spend some time trying out Unix commands related to checking and changing the permissions on files. In the description below we'll first introduce some permissions-related commands; then we'll give some suggestions of things you can try out.

Since you're doing the lab with other students, you should take advantage of being able to ask another student to test an operation for you. One of the limitations of trying out file permissions yourself is that you can only simulate one user's accesses, but someone else can check how permissions work for a non-owner.

Your CSE Labs home directories are stored on a networked file systems where some rarer aspects of permissions are unsupported or work differently. You also might not want to change the permissions on your home directory to let another student access it, even temporarily. So what we recommend you use instead is creating files and directories inside the directory named /tmp that is explicitly for temporary files. Note however that the /tmp directory is not shared between machines, so multiple people doing testing should all SSH into the same machine to run experiments in its /tmp. Also, the top-level directory /tmp has some special rules, so create a subdirectory of /tmp (for instance, named after your user name) for testing.

There are three commands we recommend you try out for looking at the permissions on Unix files (all of these commands also have more detailed documentation in their man pages):

  • The most commonly used program is ls with the -l (lowercase ell for long) option that causes it to print a line of information about each file or directory. The permissions information is in the 2nd through 10th columns, after the very first column which gives the file type. These columns mostly correspond to the nine basic permissions bits, where they are either a hyphen to represent a bit being 0, or an r, w, or x to represent a one bit based on its position. There is an exception that an x bit will turn into an s, S, t, or T in some special circumstances called "setuid", "setgid", or "sticky". Later in the line you'll see the names of the user and group owners.

  • The stat command-line program is like a more detailed version of ls -l that prints all of the metadata returned by the stat system call, across several lines. The line of its output that has the headings Access:, Uid:, and Gid: has the permissions information. The most useful extra feature for us is that it prints the octal version of the permissions bits.

  • The command getfacl provides the most expanded version of the permissions information; as the letters ACL in its name indicates, the multiple lines of its output have the structure of a general access-control list. You may find its output the easiest to read; it also become important once you try out more general ACLs (described later).

Then there are two commands you can try out to change permissions on files:

  • The command chgrp has a limited purpose: it changes the group of a file. Only the owner of a file can change its group, and they can only change it to another group they are a member of. You can use the command groups to see what groups you're a member of. There is also an analogous program named chown to change the owner of a file, but it is useless to you on CSE Labs machines because only root can change the ownership of files. (Actually, chown can change just the group; then it's equivalent to chgrp.)

  • The command chmod changes permissions. The most basic way to use it, which is convenient if you know exactly the permissions you want in octal, is to use the new permissions in octal as the first argument, and any remaining arguments are the names of files or directories you want to have that permission set. If you want to change some of the permissions and leave others the same, which is most important if your changing multiple files at once, the first argument also has a letter-based form that you can read about in the manual page. And if you want to change the permissions on a whole tree of files, the option -R causes chmod to work recursively on a directory and all of the files and directories inside it.

Take some time to try out these programs on different files, and then trying to do different operations on the files, to see whether the permissions work the way you would expect. For checking whether you can execute a file, you may want to work with a file that's a copy of a simple executable from a system directory, for instance /bin/uname. Once the basics seem to make sense, here are some further things you can try:

  • Try creating a directory where you have execute but not read permissions. For instance if you can access a file in that directory if you know its name, but ls and tab completion won't work.
  • Conversely, what happens if you try to run ls or ls -l on a directory for which you have read but not execute permission? Why?
  • If you have write permission but not execute permission on a directory, are there any operations you can do to it?
  • Suppose there is a file that you want to modify, and you don't have write permissions on the file, but you do have write permissions on the directory where the file is stored. What can you do?
  • Similar to the previous question, but what if the file is inside several levels of directories you don't have write access to, but high above it there is a directory you can modify?

(Format) Background and setup

The other side of today's lab will go in depth on another kind of vulnerability that had come up in lecture and the attack techniques for it, a format string injection. Similarly as we've done to simplify other control-flow hijacking examples, you won't need shellcode: instead your goal is to transfer control to the attack_function function. We also recommend that you spend some time (maybe 10-15 minutes) at the beginning of the lab practicing auditing the code to find the information you need to attack the vulnerability. But we'll give out suggestions along these lines on a separate linked page. You could also spend longer on looking for the vulnerability if you feel that's what you need the most practice with, but we'd recommend coming back to the attack techniques later.

The basic way that a format string vulnerability that allows an attacker to use the %n format specifier works is to provide what we call a write-what-where attack primitive. In other words, the vulnerability gives the attacker the capability to write a value of the attacker's choosing, to a location of the attacker's choosing, sometimes with some other restrictions. You may recall that the intended benign purpose of the %n feature of printf family functions is to allow a program to keep track of how many characters printf has written up to a given point, returning the value via a pointer. When this feature is used for an attack, the value that printf interprets as the argument corresponding to the %n format specifier is the "where" of the primitive (the location that will be written to), and the "what" of the primitive, the value that will be written, is the number of characters printf has written so far.

For this vulnerability, the where and the what both have pretty flexible control for an attacker, but they are limited in the size of their values. Because of this limitation, it will not work to try to overwrite a return address on the stack, which might otherwise be your first thought of a location it would be useful for an attacker to replace to take control of the program. Instead we need to rewrite some other data value that the program will use as a control-flow target. If there was a function pointer in the program, it might be a candidate. But what we recommend you use today is instead a value that the program uses for getting to a function in a shared library.

Start by copying the program source code to your working directory, with a command like:

cp /web/classes/Fall-2023/csci4271/labs/07/bclpr.c .

Because this is a simulation of a program that would be installed in a system-wide location if you controlled the full computer, we need to do a bit more to simulate "installing" it so that it will run correctly. As a location that you will have access to but will be unique even on a shared computer, we recommend that you compile the program to use a location similar to /tmp/bclpr-goldy007, but where goldy007 is replaced with your UMN ID. /tmp is a directory for temporary files that everyone has access to, while using your UMN ID makes it unique. You'll need to make this change both on line 24 of the source that defines the INSTALL_PREFIX macro and in the commands below that create directories the program will use. Note though that our attack is only going to use addresses in the main program, so you don't have to do anything to disable ASLR.

gcc -no-pie -z execstack -g -Wall -Wno-format-security -fno-stack-protector bclpr.c -o bclpr
mkdir -p /tmp/bclpr-goldy007/spool/lp0
mkdir -p /tmp/bclpr-goldy007/printouts/lp0

(Permissions) More general ACLs

The restriction of file ACLs having only one specified user and one specified group, which is traditional in Unix, has been lifted in many derived systems. The extension that's most reliably available for local filesystems on Linux is referred to as POSIX ACLs (though it never made it to an official part of the POSIX standard). You can set these with the command setfacl which is the modifying counterpart of the reading program getfacl mentioned above. The full mechanism has a number of wrinkles, but as something simple you might try giving read access to just one other person, something that in the traditional Unix model would require creating a new group.

(Format) Auditing suggestions

There are two ingredients that are needed to constitute a format-string vulnerability, and then one other choice you need to make based on the program to set up your attack. Here we'll talk about what you're looking for, and our proposed on a separate page to check your answers of if you want to jump ahead.

  1. A printf-family function with a non-constant format string. The most common way to use printf is for the first argument to be a constant string containing format specifiers starting with a percent sign. But if this string is constant, it can't be affected by an attacker. Only if the format-string argument is a non-constant (usually a variable) can the call be vulnerable.
  2. A way for the attacker the control the contents of the format string. A non-constant format string is potentially dangerous, but it will generally only be of use to an attacker if the attacker can supply their own value for the format string. So for instance if the format string argument is a variable, look back at how that value can be computed to see if it could be under an attacker's control. To make a working attack you'll need to exercise that control.
  3. Another attacker-controlled value that will be interpreted as a later argument to the printf-family function. To match up with a %n the attacker puts in a format string, they also need to control another value to be interpreted as the argument to the printf-family function. If the program is already passing other arguments to printf, they might be controllable. Failing that, having a lot of format specifiers in the format string will cause printf to read beyond its internal array on the stack to interpret other values on the stack as arguments. So look for attacker-controllable-values is the stack frame of the calling function, its caller, and so on. On x86-64, the argument corresponding to %n needs to be an 8-byte value that contains several null bytes, so it usually won't work for this value to be a null-terminated string.

Our proposed answers to the above three questions are outlined on a a separate page.

(Permissions) Giving a program permissions with setuid

The main way Unix permissions work is that all programs run by a user run with the user's permissions: anything you could do with one program, you could also do with a different program, so the exact set of operations that any given program allows doesn't affect security. But sometimes you want to allow users to do an operation that requires privilege, but with some extra checks. One way Unix makes this possible is by making a program "setuid", which means that when the program runs, it uses the UID of the owner of the executable file rather than the user who executed the program. You can thing of this as embedding some of the owner's privilege in the program, so that it can be run by users to do operations they wouldn't otherwise be allowed to do. But you have to be more careful with the design of a setuid program from a security standpoint, because if the program is subverted, it can allow an attacker to misuse the privileges.

One classic example of a service that needs to be provided by a privileged service is (local) email. To deliver a message, we would like to append it to the end of a user's mailbox file, which requires write permission on the mailbox file, but if everyone who could send email had write permissions on the recipient's mailbox, though could do undesirable things like modify previously-delivered messages. So one way of implementing this is to have mail delivery performed by a setuid process, which carries the permission to write to the mailbox file but will only do so in a safe way. For this lab, you can try out this idea in a simplified way by setting up a setuid program that delivers short messages (more like chat messages than emails) just for a single user.

To see the power of setuid in action, one student will need to set up the delivery program and another student will need to run it. To get you started we've written a short C program that has the basic needed functionality, which you can copy with a command like:

cp /web/classes/Fall-2023/csci4271/labs/07/append-to-messages.c .

The location of the messages file is hardcoded into the program as the constant variable messages_file. You'll want to update the line in the code that defines the variable to point to a file controlled by you if you're going to be receiving messages: this file doesn't need to have read or write permissions for anyone else, but it should have write permissions for you. Then you can compile the program in the usual way, such as:

gcc -Wall -g append-to-messages.c -o append-to-messages

The put the compiled program in a location where the other user who is going to send you a message can access it, and make sure they have execute permissions on it, and make the program setuid to you with the flag u+s to chmod. If everything is working correctly, the sending user shouldn't be able to access the messages file directly, but they should be able to run the setuid program with a message as a command-line argument, and the recipient should see the message appended to their messages file.

(Format) More attack techniques

You can confirm that you are successfully injecting a format string by putting format specifiers in the string and seeing what output results. In particular, try a format string that contains many copies of the format specifier %016lx to print various parameters from the stack as fixed-size 64-bit hex values.

Before getting to the attack proper, let's take a look at the function-pointer-like shared library mechanism that we will be modifying in our attack. For each function from a shared library that is called by the main program, the compiler creates as small intermediate function called a "PLT stub" (PLT stands for Procedure Linkage Table). The value that this stub function uses to find the actual implementation of the function in the shared library is an entry in the GOT (Global Offset Table). The entries in the GOT used by PLT stubs are sometimes more specifically called the .got.plt section. For the case of this attack, we need to change the entry that corresponds to the location of a function that will be called after the execution of the printf-family function that we control. The library specific function we suggest you use for this purpose is fclose, which you can see is called on line 503, soon after the vulnerable fprintf on line 500.

You can see the GOT entry used, and observe how it works, by looking at the execution of the relevant PLT stub in GDB. Here's a transcript you can try, where you can see that the name of the PLT stub for fclose is 'fclose@plt' (the single quotes are not part of the name proper, but are needed for GDB because normal C function names can't contain @.

% echo 'Hello, world!' >hello.txt
% gdb --args ./bclpr hello.txt
(gdb) disass 'fclose@plt'
Dump of assembler code for function fclose@plt:
   0x00401380 <+0>:     endbr64 
   0x00401384 <+4>:     bnd jmpq *0x3cd5(%rip)   # 0x405060 
   0x0040138b <+11>:    nopl   0x0(%rax,%rax,1)
(gdb) p *(void **)0x405060
$1 = (void *) 0x4010c0
(gdb) watch *(void **)0x405060
Hardware watchpoint 2: *(void **)0x405060
(gdb) run
Starting program: bclpr hello.txt

Hardware watchpoint 1: *(void **)0x405060

Old value = (void *) 0x4010c0
New value = (void *) 0x7ffff7e29e00 <_IO_new_fclose>
0x00007ffff7fe01d0 in _dl_fixup (l=, reloc_arg=)
    at ../sysdeps/x86_64/dl-machine.h:242
(gdb) c
Continuing.
Deleting spoolfile from /tmp/bclpr-goldy007/spool/lp0/hello.txt
[Inferior 1 (process 590385) exited normally]
(gdb) quit

The key information needed for this aspect of the attack is the address of the relevant GOT entry, which you can see from the disassembly is 0x405060. The initial value of this pointer is a placeholder that triggers the dynamic linker to determine the real address the first time the function is executed. That code (specifically the function _dl_fixup where the watchpoint was triggered) is responsible for updating the GOT entry to point to the location of the implementation of fclose in the C library, which is 0x7ffff7e29e00 on this execution.

The "where" in a format-string write-what-where attack needs to be a value on the stack, in a location where fprintf would look for an argument corresponding to a format specifier, which is under the control of the attacker. This is the thing that we will want to set to the address of the GOT entry.

Next, the "what" of our format-string attack needs to be the address of attack_function, and that needs to be the number of bytes that fprintf writes before it gets to the %ln format specifier we've added. You can get the address of the function inside GDB or with the program nm. Note that it will also be useful to know this value in decimal.

% nm bclpr | fgrep attack_function
0000000000402803 T attack_function
% printf '%d\n' 0x402803
4204547

How can we make fprintf output more than four million characters before getting to the crux of our attack? A format string that was itself that long might be too long to pass on the command line. More conveniently we can use the same printf feature we already used in %016lx: a leading zero and then a number in the format specifier causes the length of the output to be padding with leading 0s up to that number of 0s if it is not already that long. 16 is a logical number to use for a 64-bit hex value, but there is no upper limit on this padding size, so it can easily be in the millions.

The final slightly tricky aspect of this attack is making the amount of data output by printf come out to exactly the right amount, since it depends on all the things in the format string up to the point of the %ln. If you followed our suggestion of using %016lx for printing the earlier stack values, than each of them will be 16 bytes long, but don't forget to also count any spaces, newlines, or other separating characters you print. You could also try copying the data out of the log file to count its size. If you get the size wrong, the attack won't work, but you can use the same GDB watchpoint we illustrated earlier to see what value the GOT entry is being overwritten with, and adjust is accordingly.

Another note: the basic printf format specifier for writing the number of bytes output is %n, but for this attack it is important that you use the variant %ln instead. The l causes printf to interpret the pointer as a long * instead of an int *, which is important because we want to overwrite all 8 bytes of the pointer. We saw that the normal value of the location of fclose has the high-bits non-zero.

There is also one more printf feature you may find it useful to experiment with in building your attack, which can make the attacking format string shorter. The version of the printf functions on Unix systems allows you to give format specifiers out of order. If you put a decimal number and then a dollar sign right after the percent sign of the format specifier, that format specifier will take its value from the position corresponding to the number (counting from 1), rather than what would normally come next. For instance while %d will print the next argument as a decimal int, %10$d will print as a decimal int the value that would normally have corresponded to the 10th format specifier had the dollar sign feature not been used.

(Permissions) Combinatorics of Unix permissions

Here's one question you might think about as a bit of a brainteaser; it's related to Unix permissions bits but is more of a math (or CSci 2011) question. If we put no restrictions on the different bits in a permissions set, there are a total of 512 different possible permissions sets, but most of these are not very useful. One restriction you might imagine imposing to get a more limited but useful set of permissions (and which is true of most permissions in practice) is to require that the permissions never go down as you go to a more specific level. In other words, require that every permission that other users have, group users have, and every permission that group users have, the owner also has. For instance the octal permissions sets 000, 700, 777, 755, and 664 all obey this restriction, but 070 and 321 do not. How many permission sets remain after imposing this restriction? The best answer would be a formula for counting the permissions that generalizes to different numbers of permission bits (say b=3 for Unix's r, w, and x) or different numbers of levels (say l=3 for Unix's user, group, and other), and an explanation of where the formula comes from. If you're not feeling mathematically inspired, you could also try writing a program to compute the result by brute force for different parameters and looking for patterns in the results.