Functions
Compilers
COP-3402
Code generation overview
- Recall: compiler takes source language and produces target language
- Translate, not execute
- Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns
Assembly file layout
- data
- Fixed size, global data section (bss section is zeroed out)
- rodata
- Immutable data, e.g., for string constants
- text
- Executable part
- Assembly file converted to an exectuable
- Loader copies executable segments into memory
Memory layout
wiki.osdev.org/File:Elfdiagram.png
Note that the heap and stack and created at runtime by the OS
https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me
What are functions in programming languages?
Function abstraction: encapsulate a computation
- Defined by name (usually) and its input/output
- User-defined extension of the language
- Lots of uses: abstraction, reuse, organization, interfaces, and more
- Higher-order functions can take functions as inputs
- Lambda functions allow runtime creation of anonymous functions
Function behavior
(Diagram)
- Caller transfers control to callee function
- Caller provides input values
- Callee provides output value(s)
- Execution resumes in caller once callee is finished
- One box for main
- One box for f
- Start running main (exec calls this)
- Stop running main and start running f
- Finish f, then resume main where we left off
- Parameters and return values
Function calls in assembly
A call to function f
:
retval := call f
A simple definition of a function f
:
function f return 5 end function
x86-64 assembly
The assembly call
is a branch to label f
. The function definition:
f: mov $5, %rax ret
This is the call to function f
:
call f mov %rax, -16(%rbp)
The return value is saved to register rax
by convention, which is saved to a variable via a mov
.
-16(%rbp)
represents the retval
variable. We'll see how this works below.
How call
and ret
work
call f
- First saves the return address
- Then branches to
f
ret
- Retrieves the return address
- Then branches back to the return address
Example
f: 1: mov $5, %rax 2: ret _start: 3: call f 4: mov %rax, -16(%rbp)
_start
is the entry point to the program (think main
).
Diagram
- Instruction pointer (address of the next instruction to execute)
- Start at instruction pointer 4 (~_start_ label)
- Execute call
- Save return address (instruction pointer + 1 = 4)
- Branch to f (instruction pointer = 1)
- Execute mov
- Execute ret
- Retrieve return address (instruciton pointer = 4)
- Execute mov
What if we have multiple nested function calls?
Use a stack
We need to save multiple return addresses while we wait for returns
Multiple calls
h: 1: mov $7, %rax 2: ret g: 3: call h 4: ret _start: 5: call g 6: mov %rax, -16(%rbp)
Diagram
- Start at instruction pointer 5 (~_start_ label)
- Execute call
- Push return address on to stack (instruction pointer + 1 = 6)
- Branch to g (instruction pointer = 3)
- Execute call
- Push return address on to stack (instruction pointer + 1 = 4)
- Branch to h (instruction pointer = 1)
- Execute mov
- Execute ret
- Retrieve return address (instruciton pointer = 4)
- Execute ret
- Retrieve return address (instruciton pointer = 6)
- Execute mov
Function-local variables
int saved_n; // one variable for all calls to fac int fac(int n) { int fac_n; if (n <= 1) { return 1; } else { saved_n = n; n = n - 1; fac_n = fac(n); return fac_n * saved_n; } } int main() { fac(3); }
Diagram
- fac(3)
- saved_n = 3
- fac(2)
- saved_n = 2
- fac(1)
- return 1;
- fac_n = 1
- fac_n * saved_n = 1 * 2
- return 2
- fac_n = 2
- fac_n * saved_n = 2 * 2 (want this to be 2 * 3, but saved_n was overwritten)
We need saved_n to be function-local
How do we store function-local variables?
Use a stack
We need a fresh memory location for each call to a function, i.e., each call to fac
.
If instead we have one memory location for each definition, recursion will not work as expected.
Factorial with function-local variables
int fac(int n) { int saved_n; /* one variable for each call to fac */ int fac_n; if (n <= 1) { return 1; } else { saved_n = n; n = n - 1; fac_n = fac(n); return fac_n * saved_n; } } int main() { fac(3); }
Diagram
- fac(3) (push locals to the stack)
- saved_n = 3
- fac(2) (push locals to the stack)
- saved_n = 2
- fac(1) (push locals to the stack)
- return 1;
- fac_n = 1
- fac_n * saved_n = 1 * 2
- return 2
- fac_n = 2
- fac_n * saved_n = 2 * (now this is 2 * 3, because saved_n for fac(2) was saved)
Local variable allocation
function example_variables localVariables x y x := 1 y := x return y end function
x86-64 stack operations
- Register
rsp
always points to top element of the stack push
pushes value to stackpop
pops value from stack
The stack grows downwards
- Bottom of stack is a higher memory address
- Top of stack is a lower address
- Push
- Subtracts from stack pointer
- Copy data into stack pointer address
- Push
- Copy data from stack pointer address
- Add to stack pointer
Implementation of push
and pop
push %rax
is equivalent tosub $8, %rsp
mov %rax, (%rsp)
pop %rax
mov (%rsp), %rax
add $8, %rsp
Technically there are multiple push operations depending on the type and size of the operand
Allocating space for local variables
SimpleIR
function example_variables localVariables x y // ... return y end function
Assembly:
example_variables: # allocate stack space for locals sub $16, %rsp # ... # remove local variable stack space add $16, %rsp ret
Diagram
- Show stack space in memory, label stack entries with addresses
- Start with the stack and stack pointer (show its value, not just as an arrow)
- Step through each instruction
- Show how the stack pointer and memory is updated
Using stack space for local variables
- Assign each variable to a stack entry
- Effectively an array of local variables
Diagram
- Draw stack for
example_variables
- Name each stack entry according to the variable name
Allocating local variables
- Save the address of the local variables
- The base pointer
rbp
is a special register to hold the local variable address mov $rsp, $rbp
Why not use the stack pointer, %rsp, instead?
It may be used within the function to store data, making it hard to compute the offset
Setting the base pointer
example_variables: # set the base pointer first mov %rsp, %rbp # allocate stack space for locals sub $16, %rsp # ... # remove local variable stack space add $16, %rsp ret
set the base pointer before allocating stack space so that we always have the same offset from rbp for each variable.
The symbol table
Record the offset from the rbp
(base) pointer
localVariables x y
Variable | Offset |
---|---|
x | -8 |
y | -16 |
Why negative offsets? Because the stack grows downwards addresses, so we store the beginning of the locals and push space for all of them.
Why start from -8? Don't want to overwrite what's already at %rbp (which is actually the return address since we haven't saved the old rbp yet).
Diagram
- Use rbp to point to the start of the locals
- Show the offsets from rbp
Accessing local variables
Use memory load and store to access stack-allocated variables
x := 1
x86-64 mov
operations
Move immediate
Store integer 1
in a register, doesn't access memory
mov $1, %rax
Move to a register
Overwrites %rbp and loses location of locals. Doesn't access memory.
mov %rax, %rbp
Move to an address given by a register
Overwrites value at address held in %rbp.
mov %rax, (%rbp)
What part of the stack frame is at address %rbp?
Move to an offset from an address given by a register
Overwrites value at address held in %rbp + -16.
mov %rax, -16(%rbp)
What part of the stack is at %rbp + -16 for myfunc
?
Base pointer offsets
localVariables x y
Variable | Offset |
---|---|
x | -8 |
y | -16 |
Complete assignment from a constant
SimpleIR
x := 1
becomes assembly
mov $1, %rax mov %rax, -16(%rbp)
Using a variable
SimpleIR
y := x
What are the offsets from %rbp of x
and y
?
How do we copy the value of x
's memory location?
Load the value of x's memory location to a temporary register
mov -8(%rbp), %rax
Load and store are separate operations; we can't do both with a single mov.
How do we store something to retval
's memory location?
Store rax
into y's memory location
mov %rax, -16(%rbp)
Complete assignment from a variable
SimpleIR
y := x
becomes assembly
mov -16(%rbp), %rax mov %rax, -24(%rbp)
Complete local variable example
Caller
retval := call example_variables
Callee
function example_variables localVariables x y x := 1 y := x return y end function
Recall
example_variables
- pushes/subtracts from
rbp
to make space for locals - saves the
rbp
pointer first to track location of locals in memory - accesses variables via offsets from
rbp
Symbol table for example_variables
Variable | Offset | Assembly operand |
---|---|---|
x | -8 | -8(%rbp) |
y | -16 | -16(%rbp) |
Assembly
example_variables: # set the base pointer mov %rsp, %rbp # allocate stack space for locals sub $16, %rsp # x := 1 mov $1, %rax mov %rax, -8(%rbp) # y := x mov -8(%rbp), %rax mov %rax, -16(%rbp) # return value mov -16(%rbp), %rax # remove local variable stack space add $16, %rsp ret
Diagram
- Stack with addresses
- rsp and rbp with address values and arrows
- rax
Saving the caller's base pointer
Where does rbp
point to after a function returns?
It still points to the callee's stack frame
Save and update rbp
Save the caller's rbp
before setting it
push %rbp # save old base ponter mov %rsp, %rbp # set new base pointer
Restore rbp
Pop the caller's rbp
before returning
pop %rbp # restore the caller's base pointer
Complete stack frame setup
example_variables: push %rbp # save the caller's base pointer mov %rsp, %rbp # set the base pointer sub $16, %rsp # allocate stack space for locals # ... add $16, %rsp # remove local variable stack space pop %rbp # restore the caller's base pointer ret
Function prologue
push %rbp # save the caller's base pointer mov %rsp, %rbp # set the base pointer sub $16, %rsp # allocate stack space for locals
Function epilogue
add $16, %rsp # remove local variable stack space pop %rbp # restore the caller's base pointer ret
Complete example
Caller
function main localVariables retval retval := call example_variables return retval end function
Callee
function example_variables localVariables x y x := 1 y := x return y end function
Assembly
example_variables: # prologue push %rbp # save the caller's base pointer mov %rsp, %rbp # set the base pointer sub $16, %rsp # allocate stack space for locals # x := 1 mov $1, %rax mov %rax, -8(%rbp) # y := x mov -8(%rbp), %rax mov %rax, -16(%rbp) # return y mov -16(%rbp), %rax # epilogue add $16, %rsp # remove local variable stack space pop %rbp # restore the caller's base pointer ret
Diagram
- Stack with address labels
- rbp, rbp with addresses and arrows
- Start with stack frame setup for
main
- Caller is C runtime, rbp and return value can be any addresses
- One entry for
retval
- Step through the call to example variables
Calling convention
- Stack frame layout
- System V x86-64 Application Binary Interface (ABI) defines it
More resources on the ABI and calling conventions
https://wiki.osdev.org/Calling_Conventions
https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation
https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction
https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html
Layout
- Caller-managed
- Stack-allocated parameters
- Return address
- Callee-managed
- Old base pointer
- Local variables
Caller has access to the parameters and its own return address.
The callee also has to save certain registers if it uses them, e.g., rbx
Passing parameters
Since each function has its own local state, how do we communicate values from one function to another?
Registers and stack parameters
- Use registers for some
- Use the stack for the rest
- Callee can now access them without caller's stack frame
Complete stack frame
Stack contents | Managed by |
---|---|
Parameter N | Caller |
Parameter N-1 | Caller |
… | Caller |
Parameter 8 | Caller |
Parameter 7 | Caller |
Return address | Caller |
Old base pointer | Callee |
Local variable 1 | Callee |
Local variable 2 | Callee |
… | |
Local variable N | Callee |
Where are parmeters 1-6? These are the parameters passed via registers in the x86-64 System V ABI
eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
- 8 parameters
- 6 on registers
- 2 on stack
- caller sets up registers and stack
- caller sets return address (call)
- caller branches to the callee function
- callee saves the old base pointer
- callee allocates space for its local variables
- stack teardown happens in reverse order
Parameter example
function main localVariables retval x x := 3 retval := call func x return retval end function
function func localVariables a parameters a return a end function
Assembly
main: # prologue, update stack pointer pushq %rbp # save old base ponter movq %rsp, %rbp # set new base pointer push %rbx # %rbx is callee-saved # allocate stack space for locals sub $24, %rsp # assign 3 to x mov $3, %rax mov %rax, -24(%rbp) # function call mov -24(%rbp), %rdi call func add $0, %rsp mov %rax, -16(%rbp) # set return value mov -16(%rbp), %rax # epilogue add $24, %rsp pop %rbx # restore old base pointer pop %rbp # restore rbp ret func: # prologue, update stack pointer pushq %rbp # save old base ponter movq %rsp, %rbp # set new base pointer push %rbx # %rbx is callee-saved # allocate stack space for locals sub $8, %rsp # move register parameter a to local variable mov %rdi, -16(%rbp) # set return value mov -16(%rbp), %rax # epilogue add $8, %rsp pop %rbx # restore old base pointer pop %rbp # restore old base pointer ret