University of Minnesota
Machine Architecture and Organization (sec 010)
index.php

CSci 2021 Lab 0x4

Decompiling Assembly

For this lab we will be practicing our understanding of assembly. Specifically, we will be manually "decompiling" assembly code back into C code. You will be given a variety of assembly functions written in x86_64 and be tasked with writing a C equivalent for each. You can copy all of the files you will need with this command:

cp /web/classes/Fall-2018/csci2021-010/labs/0x4/{funcs_asm.s,funcs.c,test.c,funcs.h} .

Your work is meant to be done within funcs.c. Here you will find all of the function declarations for each function, and snippets of the relevant assembly pasted above each function. The first function is laid out in much more detail to get you started. In addition, you can look at the full assembly file funcs_asm.s if you like. There are also tests to check your progress. Compile them using:

gcc funcs.c test.c -o test

and then run the resulting executable.

Decompilation in a larger example

In addition to being an exercise to understand how machine code and compilers work, the process of decompiling machine code into C is also used in real life situations (including for inter-operability and security purposes) where you need to figure out how a program works when no source code or documentation is available. (This is also called reverse engineering.) There are tools that can be used to try to automate the process, but none of them are nearly as good as a human expert.

To give you a bit of a taste for how the problem becomes more difficult at a larger scale, suppose you work in the industrial espionage division and have been assigned to try to glean the secrets of the high performance of a competitor's spell-checking program. In particular, your boss wants to know more about how its hash function works. Normally it would cost $29.95, but you can get a copy of the competing program for the purposes of this lab at the usual place:

cp /web/classes/Fall-2018/csci2021-010/labs/0x4/spellcheck_lab4 .

You have two reverse engineering tools available. One is a disassembler, which converts a program binary back just into assembly-like text. You can run it like:

objdump -d spellcheck_lab4 >spellcheck-disasm.txt

The other is a decompiler which produces a representation that looks more like C code. Specifically we have compiled a recent version of FCD, an open-source decompiler, for you to use. For instance you can try:

/web/classes/Fall-2018/csci2021-010/labs/0x4/fcd spellcheck_lab4 >spellcheck-decomp.txt

Like most decompilers, this one isn't very good at choosing variable names, much less producing comments. But both the disassembly and the decompilation will be easier to follow because they still include the names of functions.

Looking at the disassembly, the decompilation, or both, try to figure out how the hash function works. Can you write C code to do the same thing?