CSCI 2041

ADVANCED PROGRAMMING PRINCIPLES

Garbage Collection

heap garbage

What happens when we execute code like this:

let v1 = (2,17) in
let v2 = [1;2;3] in
let v3 = 4::v2 in
let v4 = " astring" in
let v5 x = if x=1 then v2 else [] in
  "all that work wasted"

Each let-binding creates a data structure using space allocated on the heap

At the end, these are all “garbage” values that need to be deallocated.

Deallocation

In C/C++ and some other languages: deallocation is explicitly managed by the programmer: This can result in several kinds of mistakes:

False negative: memory that is not reachable is never collected. (Memory Leak…)
False positive: reachable memory is deallocated. (“Use after free” is a security bug)
Double Free: memory that is still reachable by the program is deallocated twice.

Most modern PLs implement automatic deallocation…

Reference Counting

Each heap object h stores a count of how many values point to h. When h.count reaches 0, deallocate.

Used by: Python, Swift, Perl, PHP, MS COM…

Problems with this approach?

Mark / Sweep

Periodically traverse the heap pointer graph:

Mark any reachable objects
Sweep up the garbage

Problems:

Is a word a pointer?
How do I mark the reachable objects
“stop the world and collect garbage”

Mark / Sweep

type value = V of int | B of block
and block = { len : int ;
  mutable marked : bool ;
  (* not quite: *)
  words : value list }
let traverse_heap (roots : value list) =
match roots with [] -> []
| (V _)::vs -> traverse_heap vs
| (B bl)::vs -> if bl.marked then (traverse_heap vs) else
    bl.marked <- true ; traverse_heap vs @ [bl.words]

Copying

Frees up bigger blocks but uses 2x memory.

Generations

Mark/sweep the “young heap”, increment gen. count of marked objects
If young heap is full, copy “old” objects to old heap
repeat with older heaps.

CSCI 2041

ADVANCED PROGRAMMING PRINCIPLES

Garbage Collection

heap garbage

Deallocation

Reference Counting

Mark / Sweep

Mark / Sweep

Copying

Generations

cs2041.org

`cs2041.org`