University of Minnesota
Machine Architecture and Organization (sec 010)

CSci 2021 Lab 0xb: Profiling Tools


In this lab we will be experimenting with profiling tools. Profiling tools supply information on what parts of a program are taking the most time, which gives us important insight as to where we might need to optimize our code. In this lab we will use two profiling tools: gprof, and valgrind/kcachegrind. We have written some examples of inefficient code for you to examine, but it will be your job to run these programs through gprof and kcachegrind to detect where things are getting hung up. To start, copy all of the relevant lab files with this command:

cp /web/classes/Spring-2020/csci2021/labs/0xb/{escape-unescape.c,kcachegrind} .


The program escape-unescape implements the transformation of a string to use C-style backslash escape sequences, and then the inverse transformation of converting the string back to the bytes those escape sequences represent. It reads its input either from the standard input or from a single file, escapes the contents of the file, and the unescapes the escaped string it created. If everything is working correctly, the second operation should undo the first operation. By default it doesn't print anything, but if you give it the -e option it will print the input after escaping, and with the option -u it will print the string after unescaping. Here are commands to compile it and run it on a short string:

gcc -Wall -g escape-unescape.c -o escape-unescape
echo 'hello world' | ./escape-unescape -e -u

You can see that the echo command puts a newline at the end of the string it prints, which is converted to \n by escaping and then back to a newline by unescaping. Here's another example that includes a tab character:

echo -e 'hello\tworld\n' | ./escape-unescape -e -u

And here's an example that escapes some random byte values:

dd if=/dev/urandom bs=100 count=1 status=none | ./escape-unescape -e

You might expect that both the escaping and unescaping processes would be pretty fast, and that they would be about the same speed. But if you look at the code you might see a potential performance problem, and you'll also see a problem if you test the program with larger inputs. The following command runs the program on the first 2000 lines of a documentation file (you can obviously change the number of you want to experiment), and measures how long it takes:

zcat /usr/share/doc/cvs/cvs.txt.gz | head -2000 | time ./escape-unescape

Depending on your computer, you'll see it takes several seconds, which seems slow. Also if you add the -e and -u options, it looks like the escaping process is pretty fast and it's the unescaping process that's slow, based on when the escaped and unescaped text is printed.

Let's see what kind of concrete information we can get by profiling with gprof. To use gprof, you need to compile a version of the program with the -pg option to include profiling code:

gcc -pg escape-unescape.c -o escape-unescape-prof

If you rerun the 2000-word example with the escape-unescape-prof version of the program, it won't take much longer because gprof's profiling is relatively inexpensive. But it will produce an extra output file named gmon.out containing the profiling result data. You can't look at this file directly, but if you run it through the gprof program, together with the original binary, it will produce a text report:

gprof escape-unescape-prof gmon.out >report.txt

The report includes two tables of data, and captions underneath explain what each table means. The one on the top is called a flat profile: it just measures the number of calls and the time spent in each function separately. The counts are exact, while the times come from random sampling, so a time of zero doesn't mean that a function literally took no time, just that the time spent was too small to notice in the sample. On this run you can see that almost all the time was spent in almost 90,000 calls to my_strcat, about one per character in the file.

The second table in the gprof output shows the relationship between function callers and callees. In this example, it confirms that all the calls to my_strcat were coming from unescape_str.


Another profiling tool that you can used is based on the same Valgrind system that you may have also used to tell you about memory bugs. (Valgrind supports a number of different "tools": the default one that tells you about memory bugs is called Memcheck, while the profiling ones are called Callgrind and Cachegrind.) Valgrind's profilers don't do any random sampling, so they can give you even more detailed information about where your program's time is spent, though they slow down the program more when you run them. You also don't need to compile your program in a special way for these tools.

Here's a command to run the Callgrind profiler on our escape/unescape program on 200 lines of input:

zcat /usr/share/doc/cvs/cvs.txt.gz | head -200 | valgrind --tool=callgrind --dump-instr=yes ./escape-unescape

The flag --dump-instr=yes tells it to keep information about individual instructions. This runs creates another file named callgrind.out.[some number], (the number is the process ID number of the program). This file is in a text format but it's still not designed to be easy to understand directly. Instead we'll use a GUI program named kcachegrind to look at it. Run kcachegrind on that output file:

./kcachegrind callgrind.out.[some number]

This GUI can show a bunch of different information in the different panes; we recommend you just explore it for a while. The pane on the left is a flat profile, similar to the first output from gprof. Initially it's sorted by the column Incl., short for "inclusive", which means the time spent in a function together with all the functions it called, so main will always be close to 100% in this measure. But you can click on columns to change the sorting, for instance clicking on Self will show the functions that are most expensive on their own.

The function selected in the left pane controls what is displayed in the two right panes. A sometimes useful tab to look at in the lower pane is the one named Call Graph, which shows a picture of the call relationships between functions, showing only the functions that took a significant amount of time. For this program that structure is pretty simple. Also the top pane can show information about the callers of a function and the bottom pane information about its callees. But most useful in this example is the tab on the top pane labeled Source Code, which shows where time was spent relative to lines of code. If you look at this view for the my_strcat function, it should show you which of the two loops in that function was expensive in this program.