Code generation: globals, functions
Lecture 09
Table of Contents
Code generation overview
- Recall: compiler takes source language and produces target language
- Translate, not execute
 
 - Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns
 
 
Assembly file layout
- data
- Fixed size, global data section (bss section is zeroed out)
 
 - rodata
- Immutable data, e.g., for string constants
 
 - text
- Executable part
 
 
Note that the heap and stack and created at runtime by the OS
https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me
simple_function.simplec
f(a, b) : function(int, int) -> int {
  return a + b;
}
main {
  return f(1, 3);
}
simple_function.s
.file "stdin"
.text
.globl f
.type f, @function
f:
        # emit the function prologue
        push	%rbp
        mov	%rsp, %rbp
        sub	$32, %rsp
        push	%rbx
        # move parameters into the stack
        mov	%rdi, -8(%rbp)
        mov	%rsi, -16(%rbp)
        # generate code for the body
        # generate code for the return expression
        # generate code for the left operand
        mov	-8(%rbp), %rax
        push	%rax
        # generate code for the right operand
        mov	-16(%rbp), %rax
        push	%rax
        # pop the right operand
        pop	%rbx
        # pop the left operand
        pop	%rax
        # do the addition
        add	%rbx, %rax
        # push the expression result
        push	%rax
        # save the return expression into %rax per the abi
        pop	%rax
        pop	%rbx
        mov	%rbp, %rsp
        pop	%rbp
        ret
.text
.globl main
.type main, @function
main:
        # stack space for argc and argv
        # emit main's prologue
        push	%rbp
        mov	%rsp, %rbp
        sub	$32, %rsp
        push	%rbx
        # move argc and argv from parameter registers to the stack
        mov	%rdi, -8(%rbp)
        mov	%rsi, -16(%rbp)
        # generate code for the body
        # generate code for the return expression
        # pass parameters either in registers or in stack
        # evaluate a parameter
        mov	$1, %rax
        push	%rax
        # evaluate a parameter
        mov	$3, %rax
        push	%rax
        # move a parameter to a register
        pop	%rsi
        # move a parameter to a register
        pop	%rdi
        # call the function
        call	f
        # restore the stack afterwards
        # push the return value
        push	%rax
        # save the return expression into %rax per the abi
        pop	%rax
        # emit main's epilogue
        pop	%rbx
        mov	%rbp, %rsp
        pop	%rbp
        ret
showing difference between att and intel assembly syntax
gcc --masm=intel -S abi.c
What about local variables, malloc'ed data?
- Local variables and heap-allocated variables are stored in memory allocated at runtime
 - Running program (process) works with the OS to be allocated memory as needed at load and runtime
 
Function semantics
What are functions in programming languages?
Function abstraction: encapsulate a computation
- Defined by name (usually) and its input/output
 
- User-defined extension of the language
 - Lots of uses: abstraction, reuse, organization, interfaces, and more
 - Higher-order functions can take functions as inputs
 - Lambda functions allow runtime creation of anonymous functions
 
How do functions work?
As distinct from mere branching
caller/callee
factorial illustrates the difference between branching vs. functions and local variables
- factorial
if you branch and modify updated x, would overwrite caller's data
int factorial (int x) { if (x <= 0) { return 1; } else { int y = x; x = x - 1; return factorial(x) * y; } } factorial_10() { return 10 * factorial_9(); } factorial_9() { return 9 * factorial_8(*); } factorial_0() { return 1; }
 
functions preserve state of the local variables
- Caller transfers control to callee function
 - Caller provides input values
 - Callee provides output value(s)
 - Execution resumes in caller once callee is finished
 
Function calls "freeze" state of caller
How would you implement this with just assembly?
- Save state on stack
 - Unconditional branch
 - Save return value
 - Another branch to go back to where we left off
 
Alternatives:
- Allocate state statically
 - Use the heap
 
"Nested" function calls freeze state of many callees
- (Diagram)
 
f(a) {
  return g(a) + 1;
}
g(b) {
  return b * 2;
}
f(2)
stack layout:
- push parameter
 - push return address
 - push locals
 
stack frames
- callee points to previous stack frame (previous base pointer)
 
Can think of recursive functions as invokes a fresh instance of the function, rather than calling itself.
Function implementation
Stack frame (or activation record)
- Holds all information needed to "freeze" state of function
- Parameters and local variables
 - Return address
 - Caller's stack frame (nested calls)
 
 
Parameter passing
- Registers and/or stack
 
- Registers are faster, but limited in number
 - May need to save them before making call
 
Application binary interface (ABI)
- Calling conventions and stack frame layout
- How to pass parameters
 - Layout of data in the stack frame
 - How to return values
 - Caller and callee responsbilities
 
 - Architecture- and OS-dependent
 
Note that parameter passing is done through registers for the first 6 parameters, then on the stack as needed.
More resources on the ABI and calling conventions
https://wiki.osdev.org/Calling_Conventions
https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation
https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction
https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html
Intel x86-64 support for functions
%rbp- base pointer points to the current function's stack frame%rsp- stack pointer points to the top of the stackpush/pop- push to and pop from the stack (move data and update%rsp)call- saves next instruction address (%rip) onto stack and branches to function's addressret- pops the caller's next instruction address and branches to it
Recall that "points to" just means that the register holds an address
Writing and calling ABI-compatible functions
- Function definition
- Prologue
 - Epilogue
 
 - Function call
- Parameter passing
 - Return value
 
 - Stack frame layout on x86 64
 - (Demo)
 
Recall that compiler is printing out instructions that will implement the stack at runtime, not actually creating the stack while compiling.
# https://github.com/longld/peda # for C compile with -g for debug symbols # run gdb gdb example b main # break at main si # step instruction, assembly instructions (intead of code) info file # get address of rodata x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes x/i addr # print as instruction # dump symtab objdump -s test
gdb resources:
https://sourceware.org/gdb/onlinedocs/gdb/Memory.html
https://sourceware.org/gdb/onlinedocs/gdb/Registers.html#Registers
https://visualgdb.com/gdbreference/commands/set_disassembly-flavor
http://dbp-consulting.com/tutorials/debugging/basicAsmDebuggingGDB.html
https://github.com/longld/peda
https://sourceware.org/gdb/onlinedocs/gdb/Auto-Display.html#Auto-Display https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html https://sourceware.org/binutils/docs-2.16/as/index.html https://sourceware.org/binutils/docs/as/i386_002dMemory.html
Implementing functions in SimpleC
Recall: compiler is translating (not executing) the function
- Need to print out (emit) equivalent assembly
 - Retrieve name, parameters from AST
 - Type-checker provides function type guarantees
 
Generate while walking the tree
- (Diagram)
 
Implement by emitting assembly "templates" from the compiler
- (Coding demo)
 
codegen_main
static void codegen_main(T_main main) {
  // create a new scope
  current_offset_scope = create_offset_scope(NULL);
  // emit the pseudo ops for the function definition
  fprintf(codegenout, ".text\n");
  fprintf(codegenout, ".globl %s\n", "main");
  fprintf(codegenout, ".type %s, @function\n", "main");
  // emit a label for the function
  fprintf(codegenout, "%s:\n", "main");
  // add local declarations to the scope
  codegen_decllist(main->decllist);
  COMMENT("stack space for argc and argv");
  insert_offset(current_offset_scope, "argc", 8);  // int argc
  insert_offset(current_offset_scope, "argv", 8);  // char **argv
  COMMENT("emit main's prologue");
  emit_prologue(current_offset_scope->stack_size);
        COMMENT("move argc and argv from parameter registers to the stack");
  int offset;
  offset = lookup_offset_in_scope(current_offset_scope, "argc");
  MOV_TO_OFFSET("%rdi", offset);
  offset = lookup_offset_in_scope(current_offset_scope, "argv");
  MOV_TO_OFFSET("%rsi", offset);
  COMMENT("generate code for the body");
  codegen_stmtlist(main->stmtlist);
  COMMENT("generate code for the return expression");
  codegen_expr(main->returnexpr);
  COMMENT("save the return expression into %rax per the abi");
  POP("%rax");
  COMMENT("emit main's epilogue");
  emit_epilogue();
  // exit the scope
  current_offset_scope = destroy_offset_scope(current_offset_scope);
}
gcc -g example.s to turn on debugging symbols
b main # break at main si # step instruction, assembly instructions (intead of code) info file # get address of rodata x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes x/i addr # print as instruction objdump -s test
Assembly language resources
https://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
https://csiflabs.cs.ucdavis.edu/~ssdavis/50/att-syntax.htm
https://sourceware.org/binutils/docs-2.16/as/index.html
https://www.imperialviolet.org/2017/01/18/cfi.html
https://www.felixcloutier.com/x86/
https://www2.cs.sfu.ca/CourseCentral/295/alavergn/Resources/Table%20of%20x86-64%20Registers.html
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
https://docs.oracle.com/cd/E19455-01/806-3773/instructionset-44/index.html
https://en.wikibooks.org/wiki/X86_Assembly/Shift_and_Rotate
https://stackoverflow.com/questions/19853012/intel-based-assembly-language-idiv