University of Minnesota
Machine Architecture and Organization (sec 010)
index.php

CSCI 2021 lab0x8


Malloc Lab and Architecture Simulators

This week's lab looks to get you familiarized with making, running, and debugging the code from the upcoming malloc lab. You will also be writing, running, and compiling Y86 code.


  1. (A Malloc Lab Introduction)

    To get the tar file for this part of the lab:

    unix> cp /web/classes/Fall-2018/csci2021-010/labs/0x8/lab0x8.tar .

    To extract the files:

    unix> tar xvf lab0x8.tar

    For this part of the lab, you will need to be in the lab0x8/malloclab-handout directory. There are a lot of files in this folder, but you only need to edit mm-implicit-bug.c; the other files are there to help with testing and measuring performance.

    To run the files we've given you, you will first need to make the files, and then you will have to run mdriver. This is desribed in the README, but the instructions to do so are also here:

    make
    ./mdriver -V

    The -V flag is optional but allows you see more information related to your run of mdriver.

    If you look at the results from running mdriver you'll notice that the program is not correct! Your goal for this part of the lab is to debug mm-implicit to see why the program is not running correctly. (If you have time after you make the program correct, you'll notice the performance is very low. What part of the code caused this and how can you fix it?)

    When looking at the code for mm-implicit you'll notice that it is very different form code you've probably seen before. This is because the program relies heavily on macros, or predefined values or computations. For example WSIZE is set to 8. This means instead of having many 8s throughout the code, WSIZE will be used. There is also a predefined MAX function this means you can use MAX(val1, val2) instead of the same few lines of code each time. This kind of structure for a file is especially useful for not having a bunch of magic numbers floating through your code while also encapsulating a lot of the code so it is easier to change in the future. It also uses up a lot less space and time when the function runs than having an actual function call MAX.

    However, if you are unused to seeing macros, it's your Halloween SPOOK for the day! (Just kidding!) But understanding what the code is doing to implement mm_malloc, etc, will be a bit more difficult. Therefore please take your time looking through and understanding the code, and please ask questions. Being familiar with this solution will be very helpful for when you start Hands-on Assignment 4.

    HINTS:
    -A useful function included in mm_implicit is the function mm_checkheap(verbose). This function can be run to check what the heap looks like while you are running your code to find where you are getting errors. To use it just put the line mm_checkheap(1); wherever you would like to check the heap. You can also use a 0, and you will just get the error messages.
    - If you are having trouble finding a starting place to look at first, we suggest you try looking at mm_malloc and related procedures.

  2. (Writing Y86-64 Code)

    You will be working in the lab0x8/y86-64 directory for the part of the lab.

    First, let's get some practice reading and executing a Y86 program: example.ys. Start by opening the file example.ys in your text editor of choice. Below is the corresponding C code to example.ys

    long example (long x, long y) {
      if(x < y) {
        x = x - 8;
      }
      else {
        y = y + 8;
      }
      return(2 * x + y);
    }

    There are comments given to help you understand which portions of the Y86 corresponds to the C. Notice that Y86 is a bit different from the X86 assembly you have worked with over that last month. First, there is no comparision instruction, so comparisions must be replaced by subtractions to get the same effect. Also, in Y86, there is not an instruction that can add a constant value to a register. So instead, we first have to store the constant in an unused register, then add the two registers together.

    Next lets run example.ys and see what happens. To 'compile' the program, use the command:

    unix> ./yas example.ys

    This will create the object file example.yo that we can 'execute'. To 'execute' the program:

    unix> ./yis example.yo

    The output here might seem pretty strange, but it will make sense after some careful examination. Instead of actually seeing the program run, we are given a summary of what happend. The main sections of this summary are 'Changes to registers:' and 'Changes to memory:'. Only focus on the registers for now (since the program didn't need to manipulate the stack).

    • %rax: 0xff...f4 = -12. This is the result or return value of example.c
    • %rsi: 0x00...02 = 2. y
    • %rdi: 0xff...f2 = -14. (x - 8) * 2
    • %r8: 0x00...08 = 8. Stored constant 8
    • %r9: 0xff...ff = -1. Result of x - y

    After examining all of the registers, it seems that our program is behaving just like the C version. Feel free to test passing different arguments into example() if you would like to confirm this further.

    Now that you have some experience with Y86, it is time to write your first program. Your task is to write a Y86-64 program sum.ys that iteratively sums the elements of a linked list. Your program should consist of some code that sets up the stack sturcture, invokes a function, and then halts (Most of this is already given in sum.ys). You need to fill in the function sum_list. Test your program using the three-element list defined at the top of sum.ys. You will know if your program is working correctly when %rax = 0x00...0abc.

    If you are unsure of where to start. First try writing a C function that sums the elements of a linked list. Once you have a C version, do your best to translate it into a Y86 program.

  3. (Back to Malloc Lab)

    After getting mm-implicit-bug.c to work in part 1, you may have noticed that the performance doesn't seem very good. This is because there is another bug in the file that isn't affecting the correctness of this allocator, but it is dramatically impacting its performance. You should expect >90% memory usage and an execution time of well under 1 second for this type of allocator and the given traces.

    Your next goal is to find and fix this bug. What might be causing the allocator to use so much extra space? Is there something that it is, or is not doing when allocating or freeing memory? Why might the time be affected so much?