Backend overview
Lecture 13
Table of Contents
Backend overview
Compiler architecture refresher
- Front-end
- "Middle-end"
- Back-end
- LLVM Compiler Infrastructure
Middle-end: designing an intermediate language
- The intermediate language bridges the programming language and machine code
Considerations
- Easier to generate code compared to machine code
- Easier to generate machine code for compared to source languages
- Convenient with many source languages
- Abstracts away common machine features, e.g., registers vs. memory, functions vs. branches
- Enables common optimizations, e.g., instruction selection (multiplcation vs. shift)
Challenges
- Preserving debugging information from source language
- Support multiple language paradigms?
- OOP, functional, C-like, interpreted, etc.
Example of intermediate languages
Low-Level Virtual Machine (LLVM) Intermediate Representation (IR)
- Originally geared towards C-like systems languages
- Popular target for many languages
- C, C++, Objective-C, Swift, Rust, with compilers for Java, Haskell, etc.
- Supports many back-ends
- x86-64, arm, mips, etc.
LLVM IR example
From https://llvm.org/docs/LangRef.html#br-instruction
Test: %cond = icmp eq i32 %a, %b br i1 %cond, label %IfEqual, label %IfUnequal IfEqual: ret i32 1 IfUnequal: ret i32 0
Java Bytecode
- Originally designed for Java
- Typically interpreted by a virtual machine (JVM)
- Some JVMs perform on-the-fly compilation to machine code (HotSpot)
- Other popular languages targeting the JVM
- Scala, Clojure, etc.
Java Bytecode example
javap -c TypeChecker.class
public java.lang.Void visitProgram(SimpleCParser$ProgramContext); Code: 0: aload_0 1: aload_1 2: invokevirtual #20 // Method visitChildren:(Lorg/antlr/v4/runtime/tree/RuleNode;)Ljava/lang/Object; 5: pop 6: aconst_null 7: areturn
Note that it has OOP support in the bytecode, e.g., virtual function invocation
Common Intermediate Language (CIL)
- Microsoft's intermediate language for its .NET framework
- Target for several languages supported by Microsoft
- C, C++, C#, VisualBasic, F#
Our intermediate language for SimpleC
- Machine-like opcodes
- Arithmetic, conditional branches
- One operation per instruction
- Unlimited temp variables (registers)
- No register management needed
- Support for function calls
Three-address code (TAC)
- One "opcode" per instruction
- A maximum of three addresses (registers) per operation
TAC opcodes and their meaning
enum Op { CONST, // arg1 is the destination; arg2 is a number literal ASSIGN, // arg1 is the destination; arg2 is a variable name INPUT, // arg1 is the variable name to input OUTPUT, // arg1 is the variable name to output ADD, // arg1 is the destination; arg2 and arg3 are the operands SUB, MULT, DIV, PARAM, // arg1 is the parameter name CALL, // arg1 is the destination; arg2 is the function name NOP, // no arguments RETURN,// arg1 is the variable name to return LABEL, // arg1 is the name of the label GOTO, // arg1 is the name of the label // ZE - operand is zero // NZ - operand is not zero GOTOZE,// arg1 is the name of the label; arg2 is the operand GOTONZ, // EQ - operands are equal // NE - operands are not equal // LT | GT | LE | GE - operands are (less than | greater than | less than or equal | greater than or equal) GOTOEQ,// arg1 is the name of the label; arg2 and arg3 are the operands GOTONE, GOTOLT, GOTOGT, GOTOGE, GOTOLE, }
Learning about and implementing the back-end
- Understanding the instruction set
- Defining SimpleC constructs in terms of TAC instructions
- Implementing code generation as a tree walking algorithm