CSCI 2041

ADVANCED PROGRAMMING PRINCIPLES

Garbage Collection

heap garbage

What happens when we execute code like this:

let v1 = (2,17) in
let v2 = [1;2;3] in
let v3 = 4::v2 in
let v4 = " astring" in
let v5 x = if x=1 then v2 else [] in
  "all that work wasted"

Each let-binding creates a data structure using space allocated on the heap

At the end, these are all “garbage” values that need to be deallocated.

Deallocation

In C/C++ and some other languages: deallocation is explicitly managed by the programmer: This can result in several kinds of mistakes:

  • False negative: memory that is not reachable is never collected. (Memory Leak…)

  • False positive: reachable memory is deallocated. (“Use after free” is a security bug)

  • Double Free: memory that is still reachable by the program is deallocated twice.

Most modern PLs implement automatic deallocation…

Reference Counting

Each heap object h stores a count of how many values point to h. When h.count reaches 0, deallocate.

Used by: Python, Swift, Perl, PHP, MS COM…

Problems with this approach?

Mark / Sweep

Periodically traverse the heap pointer graph:

  • Mark any reachable objects
  • Sweep up the garbage

Problems:

  • Is a word a pointer?
  • How do I mark the reachable objects
  • “stop the world and collect garbage”

Mark / Sweep

type value = V of int | B of block
and block = { len : int ;
  mutable marked : bool ;
  (* not quite: *)
  words : value list }
let traverse_heap (roots : value list) =
match roots with [] -> []
| (V _)::vs -> traverse_heap vs
| (B bl)::vs -> if bl.marked then (traverse_heap vs) else
    bl.marked <- true ; traverse_heap vs @ [bl.words]

Copying

Frees up bigger blocks but uses 2x memory.

Generations

  • Mark/sweep the “young heap”, increment gen. count of marked objects
  • If young heap is full, copy “old” objects to old heap
  • repeat with older heaps.

cs2041.org

// reveal.js plugins