UP | HOME

Code generation: globals, functions
Lecture 09

Table of Contents

Code generation overview

  • Recall: compiler takes source language and produces target language
    • Translate, not execute
  • Generates equivalent program in assembly
    • Each language construct has corresponding assembly code patterns

Assembly file layout

  • data
    • Fixed size, global data section (bss section is zeroed out)
  • rodata
    • Immutable data, e.g., for string constants
  • text
    • Executable part

https://wiki.osdev.org/ELF

Note that the heap and stack and created at runtime by the OS

https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me

simple_function.simplec

f(a, b) : function(int, int) -> int {
  return a + b;
}

main {
  return f(1, 3);
}

simple_function.s

.file "stdin"
.text
.globl f
.type f, @function
f:
        # emit the function prologue
        push	%rbp
        mov	%rsp, %rbp
        sub	$32, %rsp
        push	%rbx
        # move parameters into the stack
        mov	%rdi, -8(%rbp)
        mov	%rsi, -16(%rbp)
        # generate code for the body
        # generate code for the return expression
        # generate code for the left operand
        mov	-8(%rbp), %rax
        push	%rax
        # generate code for the right operand
        mov	-16(%rbp), %rax
        push	%rax
        # pop the right operand
        pop	%rbx
        # pop the left operand
        pop	%rax
        # do the addition
        add	%rbx, %rax
        # push the expression result
        push	%rax
        # save the return expression into %rax per the abi
        pop	%rax
        pop	%rbx
        mov	%rbp, %rsp
        pop	%rbp
        ret
.text
.globl main
.type main, @function
main:
        # stack space for argc and argv
        # emit main's prologue
        push	%rbp
        mov	%rsp, %rbp
        sub	$32, %rsp
        push	%rbx
        # move argc and argv from parameter registers to the stack
        mov	%rdi, -8(%rbp)
        mov	%rsi, -16(%rbp)
        # generate code for the body
        # generate code for the return expression
        # pass parameters either in registers or in stack
        # evaluate a parameter
        mov	$1, %rax
        push	%rax
        # evaluate a parameter
        mov	$3, %rax
        push	%rax
        # move a parameter to a register
        pop	%rsi
        # move a parameter to a register
        pop	%rdi
        # call the function
        call	f
        # restore the stack afterwards
        # push the return value
        push	%rax
        # save the return expression into %rax per the abi
        pop	%rax
        # emit main's epilogue
        pop	%rbx
        mov	%rbp, %rsp
        pop	%rbp
        ret

showing difference between att and intel assembly syntax

gcc --masm=intel -S abi.c

What about local variables, malloc'ed data?

Memory layout

  • Local variables and heap-allocated variables are stored in memory allocated at runtime
  • Running program (process) works with the OS to be allocated memory as needed at load and runtime

Function semantics

What are functions in programming languages?

Function abstraction: encapsulate a computation

  • Defined by name (usually) and its input/output
  • User-defined extension of the language
  • Lots of uses: abstraction, reuse, organization, interfaces, and more
  • Higher-order functions can take functions as inputs
  • Lambda functions allow runtime creation of anonymous functions

How do functions work?

As distinct from mere branching

caller/callee

factorial illustrates the difference between branching vs. functions and local variables

  • factorial
    • if you branch and modify updated x, would overwrite caller's data

      int factorial (int x) {
        if (x <= 0) {
        return 1;
        } else {
          int y = x;
          x = x - 1;
          return factorial(x) * y;
        }
      }
      
      factorial_10() {
        return 10 * factorial_9();
      }
      
      factorial_9() {
        return 9 * factorial_8(*);
      }
      
      
      factorial_0() {
        return 1;
      }
      

functions preserve state of the local variables

  • Caller transfers control to callee function
  • Caller provides input values
  • Callee provides output value(s)
  • Execution resumes in caller once callee is finished

Function calls "freeze" state of caller

How would you implement this with just assembly?

  • Save state on stack
  • Unconditional branch
  • Save return value
  • Another branch to go back to where we left off

Alternatives:

  • Allocate state statically
  • Use the heap

"Nested" function calls freeze state of many callees

  • (Diagram)

f(a) {
  return g(a) + 1;
}

g(b) {
  return b * 2;
}

f(2)

stack layout:

  • push parameter
  • push return address
  • push locals

stack frames

  • callee points to previous stack frame (previous base pointer)

Can think of recursive functions as invokes a fresh instance of the function, rather than calling itself.

Function implementation

Stack frame (or activation record)

  • Holds all information needed to "freeze" state of function
    • Parameters and local variables
    • Return address
    • Caller's stack frame (nested calls)

Parameter passing

  • Registers and/or stack
  • Registers are faster, but limited in number
  • May need to save them before making call

Application binary interface (ABI)

  • Calling conventions and stack frame layout
    • How to pass parameters
    • Layout of data in the stack frame
    • How to return values
    • Caller and callee responsbilities
  • Architecture- and OS-dependent

Intel x86-64 support for functions

  • %rbp - base pointer points to the current function's stack frame
  • %rsp - stack pointer points to the top of the stack
  • push/pop - push to and pop from the stack (move data and update %rsp)
  • call - saves next instruction address (%rip) onto stack and branches to function's address
  • ret - pops the caller's next instruction address and branches to it

Recall that "points to" just means that the register holds an address

Writing and calling ABI-compatible functions

Recall that compiler is printing out instructions that will implement the stack at runtime, not actually creating the stack while compiling.

# https://github.com/longld/peda
# for C compile with -g for debug symbols
# run gdb
gdb example
b main # break at main
si # step instruction, assembly instructions (intead of code)
info file # get address of rodata
x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes
x/i addr # print as instruction

# dump symtab
objdump -s test

gdb resources:

https://sourceware.org/gdb/onlinedocs/gdb/Memory.html

https://sourceware.org/gdb/onlinedocs/gdb/Registers.html#Registers

https://visualgdb.com/gdbreference/commands/set_disassembly-flavor

http://dbp-consulting.com/tutorials/debugging/basicAsmDebuggingGDB.html

https://github.com/longld/peda

https://sourceware.org/gdb/onlinedocs/gdb/Auto-Display.html#Auto-Display https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html https://sourceware.org/binutils/docs-2.16/as/index.html https://sourceware.org/binutils/docs/as/i386_002dMemory.html

Implementing functions in SimpleC

Recall: compiler is translating (not executing) the function

  • Need to print out (emit) equivalent assembly
  • Retrieve name, parameters from AST
  • Type-checker provides function type guarantees

Generate while walking the tree

  • (Diagram)

Implement by emitting assembly "templates" from the compiler

  • (Coding demo)

codegen_main

static void codegen_main(T_main main) {
  // create a new scope
  current_offset_scope = create_offset_scope(NULL);

  // emit the pseudo ops for the function definition
  fprintf(codegenout, ".text\n");
  fprintf(codegenout, ".globl %s\n", "main");
  fprintf(codegenout, ".type %s, @function\n", "main");

  // emit a label for the function
  fprintf(codegenout, "%s:\n", "main");

  // add local declarations to the scope
  codegen_decllist(main->decllist);

  COMMENT("stack space for argc and argv");
  insert_offset(current_offset_scope, "argc", 8);  // int argc
  insert_offset(current_offset_scope, "argv", 8);  // char **argv

  COMMENT("emit main's prologue");
  emit_prologue(current_offset_scope->stack_size);

        COMMENT("move argc and argv from parameter registers to the stack");
  int offset;
  offset = lookup_offset_in_scope(current_offset_scope, "argc");
  MOV_TO_OFFSET("%rdi", offset);
  offset = lookup_offset_in_scope(current_offset_scope, "argv");
  MOV_TO_OFFSET("%rsi", offset);

  COMMENT("generate code for the body");
  codegen_stmtlist(main->stmtlist);

  COMMENT("generate code for the return expression");
  codegen_expr(main->returnexpr);
  COMMENT("save the return expression into %rax per the abi");
  POP("%rax");

  COMMENT("emit main's epilogue");
  emit_epilogue();

  // exit the scope
  current_offset_scope = destroy_offset_scope(current_offset_scope);
}

gcc -g example.s to turn on debugging symbols

b main # break at main
si # step instruction, assembly instructions (intead of code)
info file # get address of rodata
x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes
x/i addr # print as instruction
objdump -s test

Assembly language resources

Author: Paul Gazzillo

Created: 2024-03-09 Sat 00:23

Validate