Code Generation
COP-5621, Spring 2024
Table of Contents
X86 assembly primer (AT&T style)
Operands
- Immediate:
$1
- Register:
%rax
- Memory: 0xdeadbeef
- Register indirect (pointers):
(%rbp)
- %rbp is a register that holds an address
- Register indirect plus offset:
-32(%rbp)
- Get value at address in %rbp minus 32 bytes
Operations
Order of operands
AT&T syntax puts destination second, e.g., to move contents of %rbx
to %rax
, use
movq %rbx, %rax
Ordering for arithmetic
AT&T syntax makes addition look odd:
add %rbx, %rax
Consequences to subtraction
if %rbx
is 2 and %rax
is 0, what is the state of the machine after?
sub %rbx, %rax
Move immediate
movq $1, -24(%rbp)
Load from memory
mov -24(%rbp), %rax
Store to memory
mov %rax, -8(%rbp)
Are there cases where we don't need to variable values back to memory?
Add
mov -8(%rbp), %rax mov -40(%rbp), %rbx add %rbx, %rax
How can we reduce need to load from memory?
Subtract
sub %rbx, %rax
Multiply
imul %rbx, %rax
Divide
mov -32(%rbp), %rax mov -16(%rbp), %rbx cdq idiv %rbx
Unconditional branch
jmp
is the unconditional branch. It takes an address or a label, which the assembly uses to compute the branch offset. Labels can be placed before other assembly code, e.g., .BB2
below.
jmp .BB2 .BB2: # more assembly code
How can we reduce need for calls to branch?
Conditional branch
No explicit Boolean value. Instead stored in flag register. cmp
is effectively subtraction without the result. The result of each operation sets flag registers
cmp $0, %rax jl .BB3
What jump conditions do we need for While2Addr?
https://www.felixcloutier.com/x86/cmp
https://www.felixcloutier.com/x86/jcc
jl and je (only < and = in while3addr)
our compiler already converted all comparisons to those two operators
https://github.com/xoreaxeaxeax/movfuscator?tab=readme-ov-file
Memory layout
- One memory address per variable (including temps)
- Stored in function stack frame
- https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
%rbp
is the base pointer
- Access stack frame memory with register indirect
-32(%rbp)
is address 32-bytes into the stack frame
Code generation overview
- Recall: compiler takes source language and produces target language
- Translate, not execute
- Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns
Assembly file layout
- data
- Fixed size, global data section (bss section is zeroed out)
- rodata
- Immutable data, e.g., for string constants
- text
- Executable part
Note that the heap and stack and created at runtime by the OS
https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me
What about local variables, malloc'ed data?
- Local variables and heap-allocated variables are stored in memory allocated at runtime
- Running program (process) works with the OS to be allocated memory as needed at load and runtime
Function implementation
How do functions work?
- Caller transfers control to callee function
- Caller provides input values
- Callee provides output value(s)
- Execution resumes in caller once callee is finished
Function calls "freeze" state of caller
p
- (Diagram)
How would you implement this with just assembly?
- Save state on stack
- Unconditional branch
- Save return value
- Another branch to go back to where we left off
"Nested" function calls freeze state of many callees
- (Diagram)
Can think of recursive functions as invokes a fresh instance of the function, rather than calling itself.
Stack frame (or activiation record)
- Holds all information needed to "freeze" state of function
- Parameters and local variables
- Return address
- Caller's stack frame (nested calls)
Parameter passing
- Registers and/or stack
- Registers are faster, but limited in number
- May need to save them before making call
Application binary interface (ABI)
- Calling conventions and stack frame layout
- How to pass parameters
- Layout of data in the stack frame
- How to return values
- Caller and callee responsbilities
- Architecture- and OS-dependent
Note that parameter passing is done through registers for the first 6 parameters, then on the stack as needed.
More resources on the ABI and calling conventions
https://wiki.osdev.org/Calling_Conventions
https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation
https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction
https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html
Intel x86-64 support for functions
%rbp
- base pointer points to the current function's stack frame%rsp
- stack pointer points to the top of the stackpush/pop
- push to and pop from the stack (move data and update%rsp
)call
- saves next instruction address (%rip
) onto stack and branches to function's addressret
- pops the caller's next instruction address and branches to it
Recall that "points to" just means that the register holds an address
Writing and calling ABI-compatible functions
- Function definition
- Prologue
- Epilogue
- Function call
- Parameter passing
- Return value
# https://github.com/longld/peda # for C compile with -g for debug symbols # run gdb gdb example b main # break at main si # step instruction, assembly instructions (intead of code) info file # get address of rodata x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes x/i addr # print as instruction # dump symtab objdump -s test
gdb resources:
https://sourceware.org/gdb/onlinedocs/gdb/Memory.html
https://sourceware.org/gdb/onlinedocs/gdb/Registers.html#Registers
https://visualgdb.com/gdbreference/commands/set_disassembly-flavor
http://dbp-consulting.com/tutorials/debugging/basicAsmDebuggingGDB.html
https://github.com/longld/peda
https://sourceware.org/gdb/onlinedocs/gdb/Auto-Display.html#Auto-Display https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html https://sourceware.org/binutils/docs-2.16/as/index.html https://sourceware.org/binutils/docs/as/i386_002dMemory.html
Our compiler's runtime
How can we handle inparam and outparam?
One approach: use scanf and printf (many other ways)
runtime/io.c
#include <stdio.h> #include <stdint.h> int64_t input_int64_t() { int64_t x; scanf("%ld", &x); return x; } void output_int64_t(int64_t x) { printf("%ld\n", x); }
Given to you in compiler-project/runtime
Using runtime/io.c
Always link with it (so generated code can call it)
Insert calls into generated code
Assembly for input
# initialize return value from input_int64_t movl $0, %eax # call returns to %rax call input_int64_t@PLT # save return value to inparam movq %rax, -56(%rbp)
Assembly for output
# call to output_int64_t # get outparam movq -48(%rbp), %rax # %rdi is the first function argument movq %rax, %rdi call output_int64_t@PLT
Implementing main function for While3Addr
Prologue
Allocates 80 bytes of stack space for local variables
# prologue, save old base pointer, update stack pointer pushq %rbp movq %rsp, %rbp # allocate space for stack local variables subq $80, %rsp
How can we reduce the stack space needed?
Epilogue
Pop local variables, restore caller's stack frame, set return value
# set main's return value movl $0, %eax # epilogue, restore stack pointer, restore old base pointer mov %rbp, %rsp pop %rbp ret
Full example: codegen_while
codegen_while.while
begin outparam := 1; while inparam > 1 do begin inparam := inparam - 1; outparam := outparam * inparam end end
codegen_while.ir
0: _t0 := 1 1: outparam := _t0 2: _t1 := 1 3: _t3 := _t1 - inparam 4: if _t3 < 0 goto 7 5: _t2 := 0 6: goto 8 7: _t2 := 1 8: if _t2 = 0 goto 15 9: _t4 := 1 10: _t5 := inparam - _t4 11: inparam := _t5 12: _t6 := outparam * inparam 13: outparam := _t6 14: goto 2
codegen_while.s
.file "stdin" .text .globl main .type main, @function main: # prologue, save old base pointer, update stack pointer pushq %rbp movq %rsp, %rbp # allocate space for stack local variables subq $96, %rsp # prepare args to input_int64_t movl $0, %eax # call returns to %rax call input_int64_t@PLT # save return value to inparam movq %rax, -72(%rbp) .BB0: jmp .BB1 .BB1: movq $1, -32(%rbp) mov -32(%rbp), %rax mov %rax, -64(%rbp) jmp .BB2 .BB2: movq $1, -48(%rbp) mov -48(%rbp), %rax mov -72(%rbp), %rbx sub %rbx, %rax mov %rax, -40(%rbp) mov -40(%rbp), %rax cmp $0, %rax jl .BB4 jmp .BB3 .BB3: movq $0, -24(%rbp) jmp .BB5 .BB4: movq $1, -24(%rbp) jmp .BB5 .BB5: mov -24(%rbp), %rax cmp $0, %rax je .BB7 jmp .BB6 .BB6: movq $1, -56(%rbp) mov -72(%rbp), %rax mov -56(%rbp), %rbx sub %rbx, %rax mov %rax, -16(%rbp) mov -16(%rbp), %rax mov %rax, -72(%rbp) mov -64(%rbp), %rax mov -72(%rbp), %rbx imul %rbx, %rax mov %rax, -8(%rbp) mov -8(%rbp), %rax mov %rax, -64(%rbp) jmp .BB2 .BB7: # call to output_int64_t # get outparam movq -64(%rbp), %rax # %rdi is the first function argument movq %rax, %rdi call output_int64_t@PLT # set main's return value movl $0, %eax # epilogue, restore stack pointer, restore old base pointer mov %rbp, %rsp pop %rbp ret