Compiler Project
Compilers
COP-3402
Compiler overview
Source to process
- Recall how source code becomes a process
- Our project will be on the compiler part
- Translate source to machine code
Compiler input and output
(Diagram)
Input language (SimpleIR) -> Compiler (CodeGen) -> Output language (x86 assembly)
Example run of the compiler
SimpleIR
Our intermediate representation.
Example: C program
int main(int argc, char **argv) { int x; int retval; x = read_int(); retval = print_int(x); return 0; }
Example: SimpleIR
function main localVariables argc argv x retval parameters argc argv x := call read_int x := x * 2 retval := call print_int x return 0 end function
Compilation units
- One file is one compilation
- One compilation unit defines one function
Minimum SimpleIR
function main return 0 end function
function
,return
, andend function
must appear in each SimpleIR program- Function takes a single name
- Return takes an integer literal or a variable name
Defines a new function called main.
With parameters
function main localVariables argc argv x y parameters argc argv # instructions go here return 0 end function
- localVariables declares all variable, including parameters
- parameters are a subset of localVariables
For instance, in C, parameters to the function are local variables. In SimpleIR you need to declare all local variables first, include all parameters. Then specify separately which local variables are the parameters.
Statements
Assignment
x := 2 y := x
- Sets the value of a variable to
- Use the
:=
symbol - Assignment to either
- an to integer literal or
- another variable name
Arithmetic
x := x + 2
- A single operation, no arithmetic expressions
- Always assigns the result to a variable name
Function calls
x := call print_int x
- Works like a C function call
- No parentheses
- List arguments after the function name
Branching
ifgoto x > 0 goto end x = x * -1 end:
- No structured code, more like assembly
- goto for unconditional branches, ifgoto for conditional
- Define labels as targets of branching
Pointers
t1 := &x t2 := *t1 *t1 := 11
- Ampersand gets address
- t2 := *t1 dereferences t1 and gets its value
- *t1 dereferences t1 and assigns its value
Example program: exponents
function exponent localVariables base exp result parameters base exp result := 1 top: if exp <= 0 goto end result := result * base exp := exp - 1 goto top end: return result end function
SimpleIR Grammar
grammar SimpleIR; unit: function; function: 'function' functionName=NAME localVariables? parameters? statement* returnStatement end; localVariables: 'localVariables' variables=NAME+; parameters: 'parameters' formals=NAME+; returnStatement: 'return' operand=(NAME | NUM); end: 'end' 'function'; statement: assign | dereference | reference | assignDereference | operation | call | label | gotoStatement | ifGoto; operation: variable=NAME ':=' operand1=(NAME | NUM) operatorKind=('+' | '-' | '*' | '/' | '%') operand2=(NAME | NUM); assign: variable=NAME ':=' operand=(NAME | NUM); dereference: variable=NAME ':=' '*' operand=NAME; reference: variable=NAME ':=' '&' operand=NAME; assignDereference: '*' variable=NAME ':=' operand=(NAME | NUM); call: variable=NAME ':=' 'call' functionName=NAME actuals=NAME*; label: labelName=NAME ':'; gotoStatement: 'goto' labelName=NAME; ifGoto: 'if' operand1=(NAME | NUM) operatorKind=('=' | '!=' | '<' | '<=' | '>' | '>=') operand2=(NAME | NUM) 'goto' labelName=NAME; NAME: [a-zA-Z_] ([a-zA-Z_] | [0-9])* ; NUM: [0-9]+ ; PLUS: '+' ; MINUS: '-' ; STAR: '*' ; SLASH: '/' ; PERCENT: '%' ; EQ: '=' ; NEQ: '!=' ; LT: '<' ; LTE: '<=' ; GT: '>' ; GTE: '>=' ; WS: [ \t\r\n]+ -> skip ; COMMENT : '#' ~[\r\n]* -> skip ;
Implementation
Semantic actions
- Recall how we define compilers/interpreters
- Define the grammar
- Define semantic actions for each construct
ANTLR
- Parsing framework
- Define grammar
- Write semantic actions
Listener model
- Read the ANTLR book
- Examples
Use some SimpleIR, e.g., assignment
Show syntax tree
Show translation specification
Finally show actual C++ code
notes
main things to cover
- simpleir overview
- antlr listener overview
- c++ overview
- x86 assembly basics