Functions
Compiler Implementation
COP-3402 COP-3402
Function semantics
What are functions in programming languages?
Function abstraction: encapsulate a computation
- Defined by name (usually) and its input/output
- User-defined extension of the language
- Lots of uses: abstraction, reuse, organization, interfaces, and more
- Higher-order functions can take functions as inputs
- Lambda functions allow runtime creation of anonymous functions
Function behavior
- Caller transfers control to callee function
- Caller provides input values
- Callee provides output value(s)
- Execution resumes in caller once callee is finished
Function calls "freeze" state of caller
- Function-local state
- Stack-allocated state
Example
int g(int b) {
return b * 2;
}
int f(int a) {
return g(a) + 1;
}
int main() {
printf("%d\n", f(3));
}
Diagram
- Save state on stack
- Unconditional branch
- Save return value
- Another branch to go back to where we left off
Local state
No function-local state
int x;
int f() {
x = 1;
y = g();
print(x);
print(y);
}
int g() {
x = 2;
return x;
}
With function-local state
int f() {
int x;
x = 1;
y = g();
print(x);
print(y);
}
int g() {
int x;
x = 2;
return x;
}
Stack allocation
Statically-allocated state (factorial)
int factorial(int x) {
static int old_x;
if (x <= 0) {
return 1;
} else {
old_x = x;
x = x - 1;
return factorial(x) * old_x;
}
}
Stack-allocated state (factorial)
- Change
statictoautoautois the default in C- i.e.,
int xis reallyauto int xin a function
int factorial(int x) {
auto int old_x;
if (x <= 0) {
return 1;
} else {
old_x = x;
x = x - 1;
return old_x * factorial(x);
}
}
Statically-allocated state (fib)
int fib(int n) {
static int n_minus_1;
static int n_minus_2;
if (n <= 0) {
// F_0 = 0
return 0;
} else if (n == 1) {
// F_1 = 1
return 1;
} else {
// F_n = F_n-1 + F_n-2
n_minus_1 = fib(n - 1);
n_minus_2 = fib(n - 2);
return n_minus_1 + n_minus_2;
}
}
Stack-allocated state (fib)
int fib(int n) {
auto int n_minus_1;
auto int n_minus_2;
if (n <= 0) {
// F_0 = 0
return 0;
} else if (n == 1) {
// F_1 = 1
return 1;
} else {
// F_n = F_n-1 + F_n-2
n_minus_1 = fib(n - 1);
n_minus_2 = fib(n - 2);
return n_minus_1 + n_minus_2;
}
}
Function implementation
Stack frame (or activation record)
- Holds all information needed to "freeze" state of function
- Parameters and local variables
- Return address
- Caller's stack frame (nested calls)
Parameter passing
- Registers and/or stack
- Registers are faster, but limited in number
- May need to save them before making call
Application binary interface (ABI)
- Calling conventions and stack frame layout
- How to pass parameters
- Layout of data in the stack frame
- How to return values
- Caller and callee responsbilities
- Architecture- and OS-dependent
Note that parameter passing is done through registers for the first 6 parameters, then on the stack as needed.
More resources on the ABI and calling conventions
https://wiki.osdev.org/Calling_Conventions
https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation
https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction
https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html
Intel x86-64 support for functions
%rbp- base pointer points to the current function's stack frame%rsp- stack pointer points to the top of the stackpush/pop- push to and pop from the stack (move data and update%rsp)call- saves next instruction address (%rip) onto stack and branches to function's addressret- pops the caller's next instruction address and branches to it
Recall that "points to" just means that the register holds an address
Writing and calling ABI-compatible functions
- Function definition
- Prologue
- Epilogue
- Function call
- Parameter passing
- Return value
- Stack frame layout on x86 64
- (Demo)
Recall that compiler is printing out instructions that will implement the stack at runtime, not actually creating the stack while compiling.
# https://github.com/longld/peda # for C compile with -g for debug symbols # run gdb gdb example b main # break at main si # step instruction, assembly instructions (intead of code) info file # get address of rodata x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes x/i addr # print as instruction # dump symtab objdump -s test
gdb resources:
https://sourceware.org/gdb/onlinedocs/gdb/Memory.html
https://sourceware.org/gdb/onlinedocs/gdb/Registers.html#Registers
https://visualgdb.com/gdbreference/commands/set_disassembly-flavor
http://dbp-consulting.com/tutorials/debugging/basicAsmDebuggingGDB.html
https://github.com/longld/peda
https://sourceware.org/gdb/onlinedocs/gdb/Auto-Display.html#Auto-Display https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html https://sourceware.org/binutils/docs-2.16/as/index.html https://sourceware.org/binutils/docs/as/i386_002dMemory.html
Code generation overview
- Recall: compiler takes source language and produces target language
- Translate, not execute
- Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns
Assembly file layout
- data
- Fixed size, global data section (bss section is zeroed out)
- rodata
- Immutable data, e.g., for string constants
- text
- Executable part
#+BEGIN_NOTES Note that the heap and stack and created at runtime by the OS
https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me
What about local variables, malloc'ed data?
- Local variables and heap-allocated variables are stored in memory allocated at runtime
- Running program (process) works with the OS to be allocated memory as needed at load and runtime
Implementing functions for SimpleIR
Recall: compiler is translating (not executing) the function
- Need to print out (emit) equivalent assembly
- Retrieve name, parameters from AST
- Type-checker provides function type guarantees
Generate while walking the tree
- (Diagram)
Implement by emitting assembly "templates" from the compiler
Assembly language resources
https://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
https://csiflabs.cs.ucdavis.edu/~ssdavis/50/att-syntax.htm
https://sourceware.org/binutils/docs-2.16/as/index.html
https://www.imperialviolet.org/2017/01/18/cfi.html
https://www.felixcloutier.com/x86/
https://www2.cs.sfu.ca/CourseCentral/295/alavergn/Resources/Table%20of%20x86-64%20Registers.html
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
https://docs.oracle.com/cd/E19455-01/806-3773/instructionset-44/index.html
https://en.wikibooks.org/wiki/X86_Assembly/Shift_and_Rotate
https://stackoverflow.com/questions/19853012/intel-based-assembly-language-idiv