UP | HOME

Functions
Compilers
COP-3402

Code generation overview

  • Recall: compiler takes source language and produces target language
    • Translate, not execute
  • Generates equivalent program in assembly
    • Each language construct has corresponding assembly code patterns

Assembly file layout

  • data
    • Fixed size, global data section (bss section is zeroed out)
  • rodata
    • Immutable data, e.g., for string constants
  • text
    • Executable part

https://wiki.osdev.org/ELF

  • Assembly file converted to an exectuable
  • Loader copies executable segments into memory

Memory layout

What are functions in programming languages?

Function abstraction: encapsulate a computation

  • Defined by name (usually) and its input/output
  • User-defined extension of the language
  • Lots of uses: abstraction, reuse, organization, interfaces, and more
  • Higher-order functions can take functions as inputs
  • Lambda functions allow runtime creation of anonymous functions

Function behavior

(Diagram)

  • Caller transfers control to callee function
  • Caller provides input values
  • Callee provides output value(s)
  • Execution resumes in caller once callee is finished
  • One box for main
  • One box for f
  • Start running main (exec calls this)
  • Stop running main and start running f
  • Finish f, then resume main where we left off
  • Parameters and return values

Function calls in assembly

A call to function f:

retval := call f

A simple definition of a function f:

function f
return 5
end function

x86-64 assembly

The assembly call is a branch to label f. The function definition:

f:
  mov $5, %rax
  ret

This is the call to function f:

call f
mov %rax, -16(%rbp)

The return value is saved to register rax by convention, which is saved to a variable via a mov.

-16(%rbp) represents the retval variable. We'll see how this works below.

How call and ret work

  • call f
    • First saves the return address
    • Then branches to f
  • ret
    • Retrieves the return address
    • Then branches back to the return address

Example

   f:
1:    mov $5, %rax
2:    ret
   _start:
3:    call f
4:    mov %rax, -16(%rbp)

_start is the entry point to the program (think main).

Diagram

  • Instruction pointer (address of the next instruction to execute)
  • Start at instruction pointer 4 (~_start_ label)
  • Execute call
    • Save return address (instruction pointer + 1 = 4)
    • Branch to f (instruction pointer = 1)
  • Execute mov
  • Execute ret
    • Retrieve return address (instruciton pointer = 4)
  • Execute mov

What if we have multiple nested function calls?

Use a stack

We need to save multiple return addresses while we wait for returns

Multiple calls

   h:
1:    mov $7, %rax
2:    ret
   g:
3:    call h
4:    ret
   _start:
5:    call g
6:    mov %rax, -16(%rbp)

Diagram

  • Start at instruction pointer 5 (~_start_ label)
  • Execute call
    • Push return address on to stack (instruction pointer + 1 = 6)
    • Branch to g (instruction pointer = 3)
  • Execute call
    • Push return address on to stack (instruction pointer + 1 = 4)
    • Branch to h (instruction pointer = 1)
  • Execute mov
  • Execute ret
    • Retrieve return address (instruciton pointer = 4)
  • Execute ret
    • Retrieve return address (instruciton pointer = 6)
  • Execute mov

Function-local variables

int saved_n;  // one variable for all calls to fac

int fac(int n) {
  int fac_n;

  if (n <= 1) {
    return 1;
  } else {
    saved_n = n;
    n = n - 1;
    fac_n = fac(n);
    return fac_n * saved_n;
  }
}

int main() {
  fac(3);
}

Diagram

  • fac(3)
    • saved_n = 3
    • fac(2)
      • saved_n = 2
      • fac(1)
        • return 1;
      • fac_n = 1
      • fac_n * saved_n = 1 * 2
      • return 2
    • fac_n = 2
    • fac_n * saved_n = 2 * 2 (want this to be 2 * 3, but saved_n was overwritten)

We need saved_n to be function-local

How do we store function-local variables?

Use a stack

We need a fresh memory location for each call to a function, i.e., each call to fac.

If instead we have one memory location for each definition, recursion will not work as expected.

Factorial with function-local variables

int fac(int n) {
  int saved_n;  /* one variable for each call to fac */
  int fac_n;

  if (n <= 1) {
    return 1;
  } else {
    saved_n = n;
    n = n - 1;
    fac_n = fac(n);
    return fac_n * saved_n;
  }
}

int main() {
  fac(3);
}

Diagram

  • fac(3) (push locals to the stack)
    • saved_n = 3
    • fac(2) (push locals to the stack)
      • saved_n = 2
      • fac(1) (push locals to the stack)
        • return 1;
      • fac_n = 1
      • fac_n * saved_n = 1 * 2
      • return 2
    • fac_n = 2
    • fac_n * saved_n = 2 * (now this is 2 * 3, because saved_n for fac(2) was saved)

Local variable allocation

function example_variables
localVariables x y
x := 1
y := x
return y
end function

x86-64 stack operations

  • Register rsp always points to top element of the stack
  • push pushes value to stack
  • pop pops value from stack

The stack grows downwards

  • Bottom of stack is a higher memory address
  • Top of stack is a lower address
  • Push
    • Subtracts from stack pointer
    • Copy data into stack pointer address
  • Push
    • Copy data from stack pointer address
    • Add to stack pointer

Implementation of push and pop

  • push %rax is equivalent to
    • sub $8, %rsp
    • mov %rax, (%rsp)
  • pop %rax
    • mov (%rsp), %rax
    • add $8, %rsp

Technically there are multiple push operations depending on the type and size of the operand

Allocating space for local variables

SimpleIR

function example_variables
localVariables x y
// ...
return y
end function

Assembly:

example_variables:
  # allocate stack space for locals
  sub $16, %rsp
  # ...
  # remove local variable stack space
  add $16, %rsp
  ret

Diagram

  • Show stack space in memory, label stack entries with addresses
  • Start with the stack and stack pointer (show its value, not just as an arrow)
  • Step through each instruction
  • Show how the stack pointer and memory is updated

Using stack space for local variables

  • Assign each variable to a stack entry
  • Effectively an array of local variables

Diagram

  • Draw stack for example_variables
  • Name each stack entry according to the variable name

Allocating local variables

  • Save the address of the local variables
  • The base pointer rbp is a special register to hold the local variable address
  • mov $rsp, $rbp

Why not use the stack pointer, %rsp, instead?

It may be used within the function to store data, making it hard to compute the offset

Setting the base pointer

example_variables:

  # set the base pointer first
  mov %rsp, %rbp

  # allocate stack space for locals
  sub $16, %rsp
  # ...
  # remove local variable stack space
  add $16, %rsp
  ret

set the base pointer before allocating stack space so that we always have the same offset from rbp for each variable.

The symbol table

Record the offset from the rbp (base) pointer

localVariables x y
Variable Offset
x -8
y -16

Why negative offsets? Because the stack grows downwards addresses, so we store the beginning of the locals and push space for all of them.

Why start from -8? Don't want to overwrite what's already at %rbp (which is actually the return address since we haven't saved the old rbp yet).

Diagram

  • Use rbp to point to the start of the locals
  • Show the offsets from rbp

Accessing local variables

Use memory load and store to access stack-allocated variables

x := 1

x86-64 mov operations

Move immediate

Store integer 1 in a register, doesn't access memory

mov $1, %rax

Move to a register

Overwrites %rbp and loses location of locals. Doesn't access memory.

mov %rax, %rbp

Move to an address given by a register

Overwrites value at address held in %rbp.

mov %rax, (%rbp)

What part of the stack frame is at address %rbp?

Move to an offset from an address given by a register

Overwrites value at address held in %rbp + -16.

mov %rax, -16(%rbp)

What part of the stack is at %rbp + -16 for myfunc?

Base pointer offsets

localVariables x y
Variable Offset
x -8
y -16

Complete assignment from a constant

SimpleIR

x := 1

becomes assembly

mov $1, %rax
mov %rax, -16(%rbp)

Using a variable

SimpleIR

y := x

What are the offsets from %rbp of x and y?

How do we copy the value of x's memory location?

Load the value of x's memory location to a temporary register

mov -8(%rbp), %rax

Load and store are separate operations; we can't do both with a single mov.

How do we store something to retval's memory location?

Store rax into y's memory location

mov %rax, -16(%rbp)

Complete assignment from a variable

SimpleIR

y := x

becomes assembly

mov -16(%rbp), %rax
mov %rax, -24(%rbp)

Complete local variable example

Caller

retval := call example_variables

Callee

function example_variables
localVariables x y
x := 1
y := x
return y
end function

Recall

example_variables

  • pushes/subtracts from rbp to make space for locals
  • saves the rbp pointer first to track location of locals in memory
  • accesses variables via offsets from rbp

Symbol table for example_variables

Variable Offset Assembly operand
x -8 -8(%rbp)
y -16 -16(%rbp)

Assembly

example_variables:
  # set the base pointer
  mov %rsp, %rbp

  # allocate stack space for locals
  sub $16, %rsp

  # x := 1
  mov $1, %rax
  mov %rax, -8(%rbp)

  # y := x
  mov -8(%rbp), %rax
  mov %rax, -16(%rbp)

  # return value
  mov -16(%rbp), %rax

  # remove local variable stack space
  add $16, %rsp
  ret

Diagram

  • Stack with addresses
  • rsp and rbp with address values and arrows
  • rax

Saving the caller's base pointer

Where does rbp point to after a function returns?

It still points to the callee's stack frame

Save and update rbp

Save the caller's rbp before setting it

push %rbp       # save old base ponter
mov %rsp, %rbp  # set new base pointer

Restore rbp

Pop the caller's rbp before returning

pop %rbp         # restore the caller's base pointer

Complete stack frame setup

example_variables:
  push %rbp        # save the caller's base pointer

  mov %rsp, %rbp   # set the base pointer

  sub $16, %rsp    # allocate stack space for locals

  # ...

  add $16, %rsp    # remove local variable stack space

  pop %rbp         # restore the caller's base pointer

  ret

Function prologue

push %rbp        # save the caller's base pointer
mov %rsp, %rbp   # set the base pointer
sub $16, %rsp    # allocate stack space for locals

Function epilogue

add $16, %rsp    # remove local variable stack space
pop %rbp         # restore the caller's base pointer
ret

Complete example

Caller

function main
localVariables retval
retval := call example_variables
return retval
end function

Callee

function example_variables
localVariables x y
x := 1
y := x
return y
end function

Assembly

example_variables:
  # prologue
  push %rbp        # save the caller's base pointer
  mov %rsp, %rbp   # set the base pointer
  sub $16, %rsp    # allocate stack space for locals

  # x := 1
  mov $1, %rax
  mov %rax, -8(%rbp)

  # y := x
  mov -8(%rbp), %rax
  mov %rax, -16(%rbp)

  # return y
  mov -16(%rbp), %rax

  # epilogue
  add $16, %rsp    # remove local variable stack space
  pop %rbp         # restore the caller's base pointer
  ret

Diagram

  • Stack with address labels
  • rbp, rbp with addresses and arrows
  • Start with stack frame setup for main
    • Caller is C runtime, rbp and return value can be any addresses
    • One entry for retval
  • Step through the call to example variables

Calling convention

Layout

  • Caller-managed
    • Stack-allocated parameters
    • Return address
  • Callee-managed
    • Old base pointer
    • Local variables

Caller has access to the parameters and its own return address.

The callee also has to save certain registers if it uses them, e.g., rbx

Passing parameters

Since each function has its own local state, how do we communicate values from one function to another?

Registers and stack parameters

  • Use registers for some
  • Use the stack for the rest
    • Callee can now access them without caller's stack frame

Complete stack frame

Stack contents Managed by
Parameter N Caller
Parameter N-1 Caller
Caller
Parameter 8 Caller
Parameter 7 Caller
Return address Caller
Old base pointer Callee
Local variable 1 Callee
Local variable 2 Callee
 
Local variable N Callee

Where are parmeters 1-6? These are the parameters passed via registers in the x86-64 System V ABI

x64_frame_nonleaf.png

eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

  • 8 parameters
  • 6 on registers
  • 2 on stack
  • caller sets up registers and stack
  • caller sets return address (call)
  • caller branches to the callee function
  • callee saves the old base pointer
  • callee allocates space for its local variables
  • stack teardown happens in reverse order

Parameter example

function main
localVariables retval x
x := 3
retval := call func x
return retval
end function
function func
localVariables a
parameters a
return a
end function

Assembly

main:
  # prologue, update stack pointer
  pushq %rbp # save old base ponter
  movq %rsp, %rbp # set new base pointer
  push %rbx # %rbx is callee-saved
  # allocate stack space for locals
  sub $24, %rsp
  # assign 3 to x
  mov $3, %rax
  mov %rax, -24(%rbp)
  # function call
  mov -24(%rbp), %rdi
  call func
  add $0, %rsp
  mov %rax, -16(%rbp)
  # set return value
  mov -16(%rbp), %rax
  # epilogue
  add $24, %rsp
  pop %rbx # restore old base pointer
  pop %rbp # restore rbp
  ret

func:
  # prologue, update stack pointer
  pushq %rbp # save old base ponter
  movq %rsp, %rbp # set new base pointer
  push %rbx # %rbx is callee-saved
  # allocate stack space for locals
  sub $8, %rsp
  # move register parameter a to local variable
  mov %rdi, -16(%rbp)
  # set return value
  mov -16(%rbp), %rax
  # epilogue
  add $8, %rsp
  pop %rbx # restore old base pointer
  pop %rbp # restore old base pointer
  ret

Author: Paul Gazzillo

Created: 2025-04-01 Tue 15:43

Validate