Machine code generation
Lecture 18
Table of Contents
Code generation overview
- Recall: compiler takes source language and produces target language
- Translate, not execute
- Generates equivalent program in assembly
- Each language construct has corresponding assembly code patterns
Assembly file layout
- data
- Fixed size, global data section (bss section is zeroed out)
- rodata
- Immutable data, e.g., for string constants
- text
- Executable part
Note that the heap and stack and created at runtime by the OS
https://wiki.osdev.org/ELF#Loading_ELF_Binaries https://stackoverflow.com/questions/9226088/how-are-the-different-segments-like-heap-stack-text-related-to-the-physical-me
What about local variables, malloc'ed data?
- Local variables and heap-allocated variables are stored in memory allocated at runtime
- Running program (process) works with the OS to be allocated memory as needed at load and runtime
Using GDB
Use gdb to step through your simplec output program. First, install it with
sudo apt install gdb
Clone and install this useful gdb assistant called peda. Make sure you have already compiled your simplec program as shown in "Using your compiler" above. Then step through the program like so:
gdb a.out set disassembly-flavor att # once inside of gdb b main # set a breakpoint at main run # start the program. it will wait at main si # step through each assembly instruction # continue stepping through to track the behavior
If you've downloaded and installed peda, you will see the assembly code, registers, and stack displayed after each step.
Use n or next instead of si to step over function calls, e.g., the input/output calls.
Function implementation
How do functions work?
- Caller transfers control to callee function
- Caller provides input values
- Callee provides output value(s)
- Execution resumes in caller once callee is finished
Function calls "freeze" state of caller
- (Diagram)
How would you implement this with just assembly?
- Save state on stack
- Unconditional branch
- Save return value
- Another branch to go back to where we left off
"Nested" function calls freeze state of many callees
- (Diagram)
Can think of recursive functions as invokes a fresh instance of the function, rather than calling itself.
Stack frame (or activiation record)
- Holds all information needed to "freeze" state of function
- Parameters and local variables
- Return address
- Caller's stack frame (nested calls)
Parameter passing
- Registers and/or stack
- Registers are faster, but limited in number
- May need to save them before making call
Application binary interface (ABI)
- Calling conventions and stack frame layout
- How to pass parameters
- Layout of data in the stack frame
- How to return values
- Caller and callee responsbilities
- Architecture- and OS-dependent
Note that parameter passing is done through registers for the first 6 parameters, then on the stack as needed.
More resources on the ABI and calling conventions
https://wiki.osdev.org/Calling_Conventions
https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation
https://stackoverflow.com/questions/1658294/whats-the-purpose-of-the-lea-instruction
https://www.fireeye.com/blog/threat-research/2008/03/instruction-poi.html
Intel x86-64 support for functions
%rbp
- base pointer points to the current function's stack frame%rsp
- stack pointer points to the top of the stackpush/pop
- push to and pop from the stack (move data and update%rsp
)call
- saves next instruction address (%rip
) onto stack and branches to function's addressret
- pops the caller's next instruction address and branches to it
Recall that "points to" just means that the register holds an address
Writing and calling ABI-compatible functions
- Function definition
- Prologue
- Epilogue
- Function call
- Parameter passing
- Return value
- (Demo)
# https://github.com/longld/peda # for C compile with -g for debug symbols # run gdb gdb example b main # break at main si # step instruction, assembly instructions (intead of code) info file # get address of rodata x/8xb 0x0000555555556000 # print memory, 8 he(x) (b)ytes x/i addr # print as instruction # dump symtab objdump -s test
gdb resources:
https://sourceware.org/gdb/onlinedocs/gdb/Memory.html
https://sourceware.org/gdb/onlinedocs/gdb/Registers.html#Registers
https://visualgdb.com/gdbreference/commands/set_disassembly-flavor
http://dbp-consulting.com/tutorials/debugging/basicAsmDebuggingGDB.html
https://github.com/longld/peda
https://sourceware.org/gdb/onlinedocs/gdb/Auto-Display.html#Auto-Display https://sourceware.org/gdb/onlinedocs/gdb/Continuing-and-Stepping.html https://sourceware.org/binutils/docs-2.16/as/index.html https://sourceware.org/binutils/docs/as/i386_002dMemory.html
Implementing TAC functions in assembly
Function definitions
- Emit prologue
- Save base pointer of caller's stack frame
push %rbp
- Update base pointer to current function's stack frame
mov %rsp, %rbp
- Allocate space on stack for locals and temps
sub $96, %rsp
- Save base pointer of caller's stack frame
- Emit epilogue for each function
- Restore stack pointer to caller's
mov %rbp, %rsp
- Restore the base pointer to the caller's stack frame
pop %rbp
- Return
ret
- Recall that this pops the return address from the stack and branches
- Restore stack pointer to caller's
- Emit return instructions
- Store result of expression in rax per ABI
mov -80(%rbp), %rax
- Goto the epilogue
jmp _main_return
- Store result of expression in rax per ABI
Function parameters and calls
- Move parameters to registers or the stack per the ABI
mov -56(%rbp), %rsi
- Make the call
call f
call
will store the return address (the instruction/program counter plus one instruction) onto the stack
Example function call program
SimpleC
f(x, y) { return x + y; } main() { int x; int result; input x; result = f(x, 100); output result; return 0; }
Intermediate code
[f CONST _t0 1 ASSIGN true _t0 CONST _t1 0 ASSIGN false _t1 ADD _t2 x y RETURN _t2 , main CONST _t0 1 ASSIGN true _t0 CONST _t1 0 ASSIGN false _t1 INPUT x PARAM x CONST _t2 100 PARAM _t2 CALL _t3 f ASSIGN result _t3 OUTPUT result CONST _t4 0 RETURN _t4 ]
Assembly code
.text .globl f .type f, @function f: push %rbp mov %rsp, %rbp sub $64, %rsp mov %rdi, -8(%rbp) mov %rsi, -16(%rbp) movq $1, -40(%rbp) mov -40(%rbp), %rax mov %rax, -24(%rbp) movq $0, -48(%rbp) mov -48(%rbp), %rax mov %rax, -32(%rbp) mov -8(%rbp), %rax mov -16(%rbp), %rcx add %rcx, %rax mov %rax, -56(%rbp) mov -56(%rbp), %rax jmp _f_return _f_return: mov %rbp, %rsp pop %rbp ret .text .globl main .type main, @function main: push %rbp mov %rsp, %rbp sub $80, %rsp movq $1, -40(%rbp) mov -40(%rbp), %rax mov %rax, -8(%rbp) movq $0, -48(%rbp) mov -48(%rbp), %rax mov %rax, -16(%rbp) call input_int64_t@PLT mov %rax, -24(%rbp) mov -24(%rbp), %rdi movq $100, -56(%rbp) mov -56(%rbp), %rsi call f mov %rax, -64(%rbp) mov -64(%rbp), %rax mov %rax, -32(%rbp) mov -32(%rbp), %rdi call output_int64_t@PLT movq $0, -72(%rbp) mov -72(%rbp), %rax jmp _main_return _main_return: mov %rbp, %rsp pop %rbp ret
Compiler project
Implement the rest of the machine code generator for three-address code.
You may use the template code. Please develop and use your own test cases. Ask any questions about details of the assembly code in class or in chat.
To get the repo ready, uncomment the ASMGen phase in the main driver and Makefile:
diff --git a/Compiler.java b/Compiler.java index 1e88b68..feb9fc0 100644 --- a/Compiler.java +++ b/Compiler.java @@ -38,10 +38,10 @@ public class Compiler { String outputFile = inputFileNoExt + ".s"; PrintWriter outfile = new PrintWriter(new FileWriter(outputFile)); - // // Phase 5: Machine code gen. - // ASMGen asmgen = new ASMGen(outfile); - // System.err.println(codegen.functionlist); - // asmgen.gen(codegen.functionlist); + // Phase 5: Machine code gen. + ASMGen asmgen = new ASMGen(outfile); + System.err.println(codegen.functionlist); + asmgen.gen(codegen.functionlist); // Cleanup output file. outfile.close(); diff --git a/Makefile b/Makefile index 4b4bdab..9a8c38a 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ SOURCE := \ TAC.java \ TACFunction.java \ CodeGen.java \ - # ASMGen.java + ASMGen.java CLASSES := $(SOURCE:%.java=%.class)
To run your compiler's output, assemble and link it with the I/O library, io.c.
java Compiler example.simplec gcc -o example example.s io.c
You can then run your program with ./example
.
Submission
Push the complete code generator to the main branch of your github repository. Be sure that it builds with make
from the root directory and can be run with java Compiler program.simplec
.
Grading
The compiler will be checked for functional correctness on a suite of example SimpleC programs with know inputs and outputs. For the example program above, for instance, if we have fun.in1
203
which has known output fun.groundtruth1
303
we can test the compiler by checking its output against the known output:
./fun < fun.in1 > fun.out1 diff fun.groundtruth1 fun.out1 echo $?
diff
should produce no output and the exit code $?
should be zero. If this goes wrong, you might see something like this
$ ./fun < fun.in1 > fun.out1 $ diff fun.groundtruth1 fun.out1 1c1 < 303 --- > 0 $ echo $? 1