Functions
Compilers
COP-3402

Code generation overview

Recall: compiler takes source language and produces target language
- Translate, not execute
Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns

Assembly file layout

data
- Fixed size, global data section (bss section is zeroed out)
rodata
- Immutable data, e.g., for string constants
text
- Executable part

https://wiki.osdev.org/ELF

Assembly file converted to an exectuable
Loader copies executable segments into memory

Memory layout

wiki.osdev.org/File:Elfdiagram.png

Note that the heap and stack and created at runtime by the OS

https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me

What are functions in programming languages?

Function abstraction: encapsulate a computation

Defined by name (usually) and its input/output

User-defined extension of the language
Lots of uses: abstraction, reuse, organization, interfaces, and more
Higher-order functions can take functions as inputs
Lambda functions allow runtime creation of anonymous functions

Function behavior

(Diagram)

Caller transfers control to callee function
Caller provides input values
Callee provides output value(s)
Execution resumes in caller once callee is finished

One box for main
One box for f
Start running main (exec calls this)
Stop running main and start running f
Finish f, then resume main where we left off
Parameters and return values

Function calls in assembly

A call to function f:

retval := call f

A simple definition of a function f:

function f
return 5
end function

x86-64 assembly

The assembly call is a branch to label f. The function definition:

f:
  mov $5, %rax
  ret

This is the call to function f:

call f
mov %rax, -16(%rbp)

The return value is saved to register rax by convention, which is saved to a variable via a mov.

-16(%rbp) represents the retval variable. We'll see how this works below.

How `call` and `ret` work

call f
- First saves the return address
- Then branches to f
ret
- Retrieves the return address
- Then branches back to the return address

Example

   f:
1:    mov $5, %rax
2:    ret
   _start:
3:    call f
4:    mov %rax, -16(%rbp)

_start is the entry point to the program (think main).

Diagram

Instruction pointer (address of the next instruction to execute)
Start at instruction pointer 4 (~_start_ label)
Execute call
- Save return address (instruction pointer + 1 = 4)
- Branch to f (instruction pointer = 1)
Execute mov
Execute ret
- Retrieve return address (instruciton pointer = 4)
Execute mov

What if we have multiple nested function calls?

Use a stack

We need to save multiple return addresses while we wait for returns

Multiple calls

   h:
1:    mov $7, %rax
2:    ret
   g:
3:    call h
4:    ret
   _start:
5:    call g
6:    mov %rax, -16(%rbp)

Diagram

Start at instruction pointer 5 (~_start_ label)
Execute call
- Push return address on to stack (instruction pointer + 1 = 6)
- Branch to g (instruction pointer = 3)
Execute call
- Push return address on to stack (instruction pointer + 1 = 4)
- Branch to h (instruction pointer = 1)
Execute mov
Execute ret
- Retrieve return address (instruciton pointer = 4)
Execute ret
- Retrieve return address (instruciton pointer = 6)
Execute mov

Function-local variables

int saved_n;  // one variable for all calls to fac

int fac(int n) {
  int fac_n;

  if (n <= 1) {
    return 1;
  } else {
    saved_n = n;
    n = n - 1;
    fac_n = fac(n);
    return fac_n * saved_n;
  }
}

int main() {
  fac(3);
}

Diagram

fac(3)
- saved_n = 3
- fac(2)
  - saved_n = 2
  - fac(1)
    - return 1;
  - fac_n = 1
  - fac_n * saved_n = 1 * 2
  - return 2
- fac_n = 2
- fac_n * saved_n = 2 * 2 (want this to be 2 * 3, but saved_n was overwritten)

We need saved_n to be function-local

How do we store function-local variables?

Use a stack

We need a fresh memory location for each call to a function, i.e., each call to fac.

If instead we have one memory location for each definition, recursion will not work as expected.

Factorial with function-local variables

int fac(int n) {
  int saved_n;  /* one variable for each call to fac */
  int fac_n;

  if (n <= 1) {
    return 1;
  } else {
    saved_n = n;
    n = n - 1;
    fac_n = fac(n);
    return fac_n * saved_n;
  }
}

int main() {
  fac(3);
}

Diagram

fac(3) (push locals to the stack)
- saved_n = 3
- fac(2) (push locals to the stack)
  - saved_n = 2
  - fac(1) (push locals to the stack)
    - return 1;
  - fac_n = 1
  - fac_n * saved_n = 1 * 2
  - return 2
- fac_n = 2
- fac_n * saved_n = 2 * (now this is 2 * 3, because saved_n for fac(2) was saved)

Local variable allocation

function example_variables
localVariables x y
x := 1
y := x
return y
end function

x86-64 stack operations

Register rsp always points to top element of the stack
push pushes value to stack
pop pops value from stack

The stack grows downwards

Bottom of stack is a higher memory address
Top of stack is a lower address
Push
- Subtracts from stack pointer
- Copy data into stack pointer address
Push
- Copy data from stack pointer address
- Add to stack pointer

Implementation of push and pop

push %rax is equivalent to
- sub $8, %rsp
- mov %rax, (%rsp)
pop %rax
- mov (%rsp), %rax
- add $8, %rsp

Technically there are multiple push operations depending on the type and size of the operand

Allocating space for local variables

SimpleIR

function example_variables
localVariables x y
// ...
return y
end function

Assembly:

example_variables:
  # allocate stack space for locals
  sub $16, %rsp
  # ...
  # remove local variable stack space
  add $16, %rsp
  ret

Diagram

Show stack space in memory, label stack entries with addresses
Start with the stack and stack pointer (show its value, not just as an arrow)
Step through each instruction
Show how the stack pointer and memory is updated

Using stack space for local variables

Assign each variable to a stack entry
Effectively an array of local variables

Diagram

Draw stack for example_variables
Name each stack entry according to the variable name

Allocating local variables

Save the address of the local variables
The base pointer rbp is a special register to hold the local variable address
mov $rsp, $rbp

Why not use the stack pointer, %rsp, instead?

It may be used within the function to store data, making it hard to compute the offset

Setting the base pointer

example_variables:

  # set the base pointer first
  mov %rsp, %rbp

  # allocate stack space for locals
  sub $16, %rsp
  # ...
  # remove local variable stack space
  add $16, %rsp
  ret

set the base pointer before allocating stack space so that we always have the same offset from rbp for each variable.

The symbol table

Record the offset from the rbp (base) pointer

localVariables x y

Variable	Offset
x	-8
y	-16

Why negative offsets? Because the stack grows downwards addresses, so we store the beginning of the locals and push space for all of them.

Why start from -8? Don't want to overwrite what's already at %rbp (which is actually the return address since we haven't saved the old rbp yet).

Diagram

Use rbp to point to the start of the locals
Show the offsets from rbp

Accessing local variables

Use memory load and store to access stack-allocated variables

x := 1

x86-64 `mov` operations

Move immediate

Store integer 1 in a register, doesn't access memory

mov $1, %rax

Move to a register

Overwrites %rbp and loses location of locals. Doesn't access memory.

mov %rax, %rbp

Move to an address given by a register

Overwrites value at address held in %rbp.

mov %rax, (%rbp)

What part of the stack frame is at address %rbp?

Move to an offset from an address given by a register

Overwrites value at address held in %rbp + -16.

mov %rax, -16(%rbp)

What part of the stack is at %rbp + -16 for myfunc?

Base pointer offsets

localVariables x y

Variable	Offset
x	-8
y	-16

Complete assignment from a constant

SimpleIR

x := 1

becomes assembly

mov $1, %rax
mov %rax, -16(%rbp)

Using a variable

SimpleIR

y := x

What are the offsets from %rbp of x and y?

How do we copy the value of `x`'s memory location?

Load the value of x's memory location to a temporary register

mov -8(%rbp), %rax

Load and store are separate operations; we can't do both with a single mov.

How do we store something to `retval`'s memory location?

Store rax into y's memory location

mov %rax, -16(%rbp)

Complete assignment from a variable

SimpleIR

y := x

becomes assembly

mov -16(%rbp), %rax
mov %rax, -24(%rbp)

Complete local variable example

Caller

retval := call example_variables

Callee

function example_variables
localVariables x y
x := 1
y := x
return y
end function

Recall

example_variables

pushes/subtracts from rbp to make space for locals
saves the rbp pointer first to track location of locals in memory
accesses variables via offsets from rbp

Symbol table for `example_variables`

Variable	Offset	Assembly operand
x	-8	-8(%rbp)
y	-16	-16(%rbp)

Assembly

example_variables:
  # set the base pointer
  mov %rsp, %rbp

  # allocate stack space for locals
  sub $16, %rsp

  # x := 1
  mov $1, %rax
  mov %rax, -8(%rbp)

  # y := x
  mov -8(%rbp), %rax
  mov %rax, -16(%rbp)

  # return value
  mov -16(%rbp), %rax

  # remove local variable stack space
  add $16, %rsp
  ret

Diagram

Stack with addresses
rsp and rbp with address values and arrows
rax

Saving the caller's base pointer

Where does rbp point to after a function returns?

It still points to the callee's stack frame

Save and update `rbp`

Save the caller's rbp before setting it

push %rbp       # save old base ponter
mov %rsp, %rbp  # set new base pointer

Restore `rbp`

Pop the caller's rbp before returning

pop %rbp         # restore the caller's base pointer

Complete stack frame setup

example_variables:
  push %rbp        # save the caller's base pointer

  mov %rsp, %rbp   # set the base pointer

  sub $16, %rsp    # allocate stack space for locals

  # ...

  add $16, %rsp    # remove local variable stack space

  pop %rbp         # restore the caller's base pointer

  ret

Function prologue

push %rbp        # save the caller's base pointer
mov %rsp, %rbp   # set the base pointer
sub $16, %rsp    # allocate stack space for locals

Function epilogue

add $16, %rsp    # remove local variable stack space
pop %rbp         # restore the caller's base pointer
ret

Complete example

Caller

function main
localVariables retval
retval := call example_variables
return retval
end function

Callee

function example_variables
localVariables x y
x := 1
y := x
return y
end function

Assembly

example_variables:
  # prologue
  push %rbp        # save the caller's base pointer
  mov %rsp, %rbp   # set the base pointer
  sub $16, %rsp    # allocate stack space for locals

  # x := 1
  mov $1, %rax
  mov %rax, -8(%rbp)

  # y := x
  mov -8(%rbp), %rax
  mov %rax, -16(%rbp)

  # return y
  mov -16(%rbp), %rax

  # epilogue
  add $16, %rsp    # remove local variable stack space
  pop %rbp         # restore the caller's base pointer
  ret

Diagram

Stack with address labels
rbp, rbp with addresses and arrows
Start with stack frame setup for main
- Caller is C runtime, rbp and return value can be any addresses
- One entry for retval
Step through the call to example variables

Calling convention

Stack frame layout
System V x86-64 Application Binary Interface (ABI) defines it

More resources on the ABI and calling conventions

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf-internal.c;h=547a3a868b4668bf615cf3f39a92e3c11cbb98ad;hb=HEAD#l1288

https://wiki.osdev.org/Calling_Conventions

https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation

https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction

https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html

Layout

Caller-managed
- Stack-allocated parameters
- Return address
Callee-managed
- Old base pointer
- Local variables

Caller has access to the parameters and its own return address.

The callee also has to save certain registers if it uses them, e.g., rbx

Passing parameters

Since each function has its own local state, how do we communicate values from one function to another?

Registers and stack parameters

Use registers for some
Use the stack for the rest
- Callee can now access them without caller's stack frame

Complete stack frame

Stack contents	Managed by
Parameter N	Caller
Parameter N-1	Caller
…	Caller
Parameter 8	Caller
Parameter 7	Caller
Return address	Caller
Old base pointer	Callee
Local variable 1	Callee
Local variable 2	Callee
…
Local variable N	Callee

Where are parmeters 1-6? These are the parameters passed via registers in the x86-64 System V ABI

eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/

8 parameters
6 on registers
2 on stack
caller sets up registers and stack
caller sets return address (call)
caller branches to the callee function
callee saves the old base pointer
callee allocates space for its local variables
stack teardown happens in reverse order

Parameter example

function main
localVariables retval x
x := 3
retval := call func x
return retval
end function

function func
localVariables a
parameters a
return a
end function

Assembly

main:
  # prologue, update stack pointer
  pushq %rbp # save old base ponter
  movq %rsp, %rbp # set new base pointer
  push %rbx # %rbx is callee-saved
  # allocate stack space for locals
  sub $24, %rsp
  # assign 3 to x
  mov $3, %rax
  mov %rax, -24(%rbp)
  # function call
  mov -24(%rbp), %rdi
  call func
  add $0, %rsp
  mov %rax, -16(%rbp)
  # set return value
  mov -16(%rbp), %rax
  # epilogue
  add $24, %rsp
  pop %rbx # restore old base pointer
  pop %rbp # restore rbp
  ret

func:
  # prologue, update stack pointer
  pushq %rbp # save old base ponter
  movq %rsp, %rbp # set new base pointer
  push %rbx # %rbx is callee-saved
  # allocate stack space for locals
  sub $8, %rsp
  # move register parameter a to local variable
  mov %rdi, -16(%rbp)
  # set return value
  mov -16(%rbp), %rax
  # epilogue
  add $8, %rsp
  pop %rbx # restore old base pointer
  pop %rbp # restore old base pointer
  ret

Functions Compilers COP-3402

Code generation overview

Assembly file layout

Memory layout

What are functions in programming languages?

Function abstraction: encapsulate a computation

Function behavior

Function calls in assembly

x86-64 assembly

How call and ret work

Example

Multiple calls

Function-local variables

Factorial with function-local variables

Local variable allocation

x86-64 stack operations

The stack grows downwards

Allocating space for local variables

Using stack space for local variables

Allocating local variables

Setting the base pointer

The symbol table

Accessing local variables

x86-64 mov operations

Move immediate

Move to a register

Move to an address given by a register

Move to an offset from an address given by a register

Base pointer offsets

Complete assignment from a constant

Using a variable

How do we copy the value of x's memory location?

How do we store something to retval's memory location?

Complete assignment from a variable

Complete local variable example

Recall

Symbol table for example_variables

Assembly

Saving the caller's base pointer

Save and update rbp

Restore rbp

Complete stack frame setup

Function prologue

Function epilogue

Complete example

Assembly

Calling convention

Layout

Passing parameters

Registers and stack parameters

Complete stack frame

Parameter example

Assembly

Functions
Compilers
COP-3402

How `call` and `ret` work

x86-64 `mov` operations

How do we copy the value of `x`'s memory location?

How do we store something to `retval`'s memory location?

Symbol table for `example_variables`

Save and update `rbp`

Restore `rbp`