Generating intermediate representation (IR)
Compiler Project
Table of Contents
In this phase, we will create a translation from the While language to equivalent While3Addr code. The translation works by traversing the While program's parse tree, emitting While3Addr instructions that are equivalent.
The translator can be run with while2ir tests/fact.while
, which will output While3Addr code once implemented.
The translation rules provide a guide to how each While construct can be translated into equivalent While3Addr instructions.
You should have three references open while implementing your generator:
Using ANTLR's visitors
The While language's grammar is defined in compiler/grammar/While.g4
and ANTLR generates a parser for the grammar. It also provides a WhileVisitor for processing the resulting parse tree. The translation is a traversal of the tree. A visitor helps writing recursive tree traversals. Unlike traversals like counting the nodes in a tree, where each node uses the same recursive function, each parse tree node will be one of several possible types, e.g., Assignment, Const, etc.
While there are only three non-terminals in the While language, s
for statements, b
for Boolean expressions, and a
for arithmetic expressions, each non-terminal have several alternative productions, e.g., s
has Assignment
, Skip
, If
, While
, and Compound
. The parse tree will have a different node type for each of these alternatives. Since each node type has a different translation rule, the visitor pattern helps ease the implementation by automatically dispatching to a function corresponding to the type of node. To define a traversal over a tree with mulitple node types using a visitor, we only need to define a visit function for each node type, e.g., visitWhile
, visitNot
, visitBOp
, etc.; call visit function on child nodes; and the visitor will automatically pick the corresponding visit function at runtime based on the input parse tree's types.
For example, the While
construct contains a nested Boolean expression b
. When translating a while construct, we implement the translation logic in visitWhile
. In order to translate its nested Boolean expression b
, we call self.visit(ctx.b())
, which will automatically dispatch to the correpond visitor for Boolean expressions. If are translating while x > 0 do x := x + 1
, calling self.visit(ctx.b())
would dispatch to the visitROp
function, since x > 0
is an instance of the ROp
construct.
As seen in the translation rules, statements only produce While3Addr code, while expressions produce code and a temporary variable that will hold the resulting value of executing the instructions at runtime. This return behavior is reflected in visitors by what they return. Statements return None
, instead using self.prog.add(instr)
to emit While3Addr instructions, while expressions return the name of a variable.
For instance, here is how visitors are used to implement the translation of an Assignment. Recall that the grammar for an Assignment is as follows:
s: ID ':=' a # Assignment
ID
is a terminal whose string value can be accessed with ctx.ID().getText()
. a
is non-terminal, whose subtree is translated by calling its visitor with self.visit(ctx.a())
. Assuming that the visitors for a
are written correctly, they will return a variable name that will hold the value of executing the expression at runtime.
# Visit a parse tree produced by WhileParser#Assignment. def visitAssignment(self, ctx:WhileParser.AssignmentContext): # create the assignment instruction instr = Assign(ctx.ID().getText(), self.visit(ctx.a())) # add the new instruction to the output program self.prog.add(instr) # return None, since statements produce nothing besides the While3Addr collected by self.prog.add return None
Let's look at the visitor for an expressions. Recall the grammar for a variable access, which consists solely of an ID terminal:
a: ID # Var
The expression visitor will return the name of a variable, which at runtime will contain the value of the expressions. A simple implementation is to just return the name of the variable being accessed:
def visitVar(self, ctx:WhileParser.VarContext): # get the name of the variable return ctx.ID().getText()
Alternatively, the visitor could also construct a new temporary variable and use a While3Addr assignment:
def visitVar(self, ctx:WhileParser.NumContext): resultvar = self.prog.temp() instr = Assign(resultvar, ctx.ID().getText()) self.prog.add(instr) return resultvar
Using ANTLR's parsing context (ctx
)
A context structure is passed to each visitor, e.g., visitAssignment
is passed the AssignmentContext as argument ctx
. This structure contains child nodes, whose names correspond to the ANTLR grammr (compiler/grammar/While.g4
). For instance, Assignment has two relevant children, ID
and a
. (:=
is also a child of Assignment, but knowing that the construct is Assignment always implies the :=
operator appears.)
- Getting a child node:
self.visit(ctx.a())
orself.visit(ctx.a(1))
- Getting the string value of an ID token:
ctx.ID().getText()
- Getting the number value of a NUM token:
int(ctx.NUM().getText())
Getting the
op
inBOp
,ROp
, orAOp
:ctx.op.type
.Check the
op
by comparing against the terminals, e.g., ifvisitBOp
, to check whether the operator is Boolean AND, useif WhileParser.AND == ctx.op.type:
Using the While3Addr.Program API
The template includes a Program class (=compiler.While3Addr.Program) that maintains the output program's program counter, creates unique temporary variable names and records the generated instructions.
- Get the current program counter:
self.prog.pc()
- Get a fresh temporary variable name:
self.prog.temp()
- An an instruction to the output program:
self.prog.add(instr)
For instance, visitTrue's translation rule creates a new temporary variable, and adds a new instruction that sets the temporary to constant 1
, i.e.,
# Visit a parse tree produced by WhileParser#True. def visitTrue(self, ctx:WhileParser.TrueContext): resultvar = self.prog.temp() # create a new temp var instr = Const(resultvar, 1) # create the Const instruction self.prog.add(instr) # add instruction to output return resultvar # return temp var to parent
Creating While3Addr instructions
The While3Addr instructions are defined in compiler/While3Addr.py
. There are five instruction types:
Const
, e.g.,_t1 :
1=, which can be created withConst("_t1", 1)
.Assign
, e.g.,x :
y=, which can be created withAssign("x", "y")
Op
, e.g.,x :
x + y=, which can be created withOp("x", "x", Arithmetic.ADD, "y")
Arithmetic
containsADD
,SUBTRACT
,MULTIPLY
, andDIVIDE
for the operators.- Goto, e.g.,
goto 3
, which can be created withGoto(3)
. IfGoto, e.g.,
if x < 0 goto 7
, which can be created withIfGoto("x", Relational.LESSTHAN, 7)
Recall that IfGoto always compares to
0
as the right operand.Relational
contains eitherLESSTHAN
orEQUALS
(which can be used to implement the remaining relational operators from the WHILE language.)
Backpatching branch targets for control-flow constructs
When emitting branches that jump forward in the set of instructions, we don't easily know the target program counter until after we have generated all instructions preceding the target. Backpatching is a technique where we generate the branch, saving a pointer to instruction's class. Then, we generate the subsequent instructions, keeping track of the program counter. Finally, once we have reached the target instruction, we fill in the branch's program counter using the pointer to the instruction.
See an example of using backpatching in the following implementation of visitNot
def visitNot(self, ctx:WhileParser.NotContext): # compile the operand's expression operand = self.visit(ctx.b()) # create a variable for the result resultvar = self.prog.temp() # test the operand's expression to see if it's zero (false) ifgoto = IfGoto(operand, Relational.EQUALTO, None) self.prog.add(ifgoto) # if not, set the result to false (the negation of not false) self.prog.add(Const(resultvar, 0)) # generate a goto to skip the else branch goto = Goto(None) self.prog.add(goto) # backpatch the test for the expression being false to the pc of # the instruction that sets the results for true (negation) ifgoto.pc = self.prog.pc() # add the instruction that sets the result to true self.prog.add(Const(resultvar, 1)) # backpatch the pc of the goto that skips the else branch goto.pc = self.prog.pc() assert ifgoto.pc != None assert goto.pc != None return resultvar
For instance, in visitNot
, we generate two forward branches, e.g., in the following example, lines 3 and 5.
1: t0 = 1 # operand = self.visit(ctx.b()) 2: t1 = t0 # (ditto) 3: if t1 = 0 goto 6 # generate ifgoto with no pc 4: t2 = 0 5: goto 7 # after adding the goto, update the ifgoto's pc with the ifgoto.pc = self.prog.pc(), which is now 6 6: t2 = 1 # after adding the t2 = 1, update the goto's pc with goto.pc = self.prog.pc(), which is now 7