UP | HOME

Generating intermediate representation (IR)
Compiler Project

Table of Contents

In this phase, we will create a translation from the While language to equivalent While3Addr code. The translation works by traversing the While program's parse tree, emitting While3Addr instructions that are equivalent.

The translator can be run with while2ir tests/fact.while, which will output While3Addr code once implemented.

The translation rules provide a guide to how each While construct can be translated into equivalent While3Addr instructions.

You should have three references open while implementing your generator:

Using ANTLR's visitors

The While language's grammar is defined in compiler/grammar/While.g4 and ANTLR generates a parser for the grammar. It also provides a WhileVisitor for processing the resulting parse tree. The translation is a traversal of the tree. A visitor helps writing recursive tree traversals. Unlike traversals like counting the nodes in a tree, where each node uses the same recursive function, each parse tree node will be one of several possible types, e.g., Assignment, Const, etc.

While there are only three non-terminals in the While language, s for statements, b for Boolean expressions, and a for arithmetic expressions, each non-terminal have several alternative productions, e.g., s has Assignment, Skip, If, While, and Compound. The parse tree will have a different node type for each of these alternatives. Since each node type has a different translation rule, the visitor pattern helps ease the implementation by automatically dispatching to a function corresponding to the type of node. To define a traversal over a tree with mulitple node types using a visitor, we only need to define a visit function for each node type, e.g., visitWhile, visitNot, visitBOp, etc.; call visit function on child nodes; and the visitor will automatically pick the corresponding visit function at runtime based on the input parse tree's types.

For example, the While construct contains a nested Boolean expression b. When translating a while construct, we implement the translation logic in visitWhile. In order to translate its nested Boolean expression b, we call self.visit(ctx.b()), which will automatically dispatch to the correpond visitor for Boolean expressions. If are translating while x > 0 do x := x + 1, calling self.visit(ctx.b()) would dispatch to the visitROp function, since x > 0 is an instance of the ROp construct.

As seen in the translation rules, statements only produce While3Addr code, while expressions produce code and a temporary variable that will hold the resulting value of executing the instructions at runtime. This return behavior is reflected in visitors by what they return. Statements return None, instead using self.prog.add(instr) to emit While3Addr instructions, while expressions return the name of a variable.

For instance, here is how visitors are used to implement the translation of an Assignment. Recall that the grammar for an Assignment is as follows:

s:   ID ':=' a # Assignment

ID is a terminal whose string value can be accessed with ctx.ID().getText(). a is non-terminal, whose subtree is translated by calling its visitor with self.visit(ctx.a()). Assuming that the visitors for a are written correctly, they will return a variable name that will hold the value of executing the expression at runtime.

# Visit a parse tree produced by WhileParser#Assignment.
def visitAssignment(self, ctx:WhileParser.AssignmentContext):
    # create the assignment instruction
    instr = Assign(ctx.ID().getText(), self.visit(ctx.a()))
    # add the new instruction to the output program
    self.prog.add(instr)
    # return None, since statements produce nothing besides the While3Addr collected by self.prog.add
    return None

Let's look at the visitor for an expressions. Recall the grammar for a variable access, which consists solely of an ID terminal:

a:   ID # Var

The expression visitor will return the name of a variable, which at runtime will contain the value of the expressions. A simple implementation is to just return the name of the variable being accessed:

def visitVar(self, ctx:WhileParser.VarContext):
    # get the name of the variable
    return ctx.ID().getText()

Alternatively, the visitor could also construct a new temporary variable and use a While3Addr assignment:

def visitVar(self, ctx:WhileParser.NumContext):
    resultvar = self.prog.temp()
    instr = Assign(resultvar, ctx.ID().getText())
    self.prog.add(instr)
    return resultvar

Using ANTLR's parsing context (ctx)

A context structure is passed to each visitor, e.g., visitAssignment is passed the AssignmentContext as argument ctx. This structure contains child nodes, whose names correspond to the ANTLR grammr (compiler/grammar/While.g4). For instance, Assignment has two relevant children, ID and a. (:= is also a child of Assignment, but knowing that the construct is Assignment always implies the := operator appears.)

  • Getting a child node: self.visit(ctx.a()) or self.visit(ctx.a(1))
  • Getting the string value of an ID token: ctx.ID().getText()
  • Getting the number value of a NUM token: int(ctx.NUM().getText())
  • Getting the op in BOp, ROp, or AOp: ctx.op.type.

    Check the op by comparing against the terminals, e.g., if visitBOp, to check whether the operator is Boolean AND, use

    if WhileParser.AND == ctx.op.type:
    

Using the While3Addr.Program API

The template includes a Program class (=compiler.While3Addr.Program) that maintains the output program's program counter, creates unique temporary variable names and records the generated instructions.

  • Get the current program counter: self.prog.pc()
  • Get a fresh temporary variable name: self.prog.temp()
  • An an instruction to the output program: self.prog.add(instr)

For instance, visitTrue's translation rule creates a new temporary variable, and adds a new instruction that sets the temporary to constant 1, i.e.,

# Visit a parse tree produced by WhileParser#True.
    def visitTrue(self, ctx:WhileParser.TrueContext):
        resultvar = self.prog.temp()  # create a new temp var
        instr = Const(resultvar, 1)   # create the Const instruction
        self.prog.add(instr)          # add instruction to output
        return resultvar              # return temp var to parent

Creating While3Addr instructions

The While3Addr instructions are defined in compiler/While3Addr.py. There are five instruction types:

  1. Const, e.g., _t1 : 1=, which can be created with Const("_t1", 1).
  2. Assign, e.g., x : y=, which can be created with Assign("x", "y")
  3. Op, e.g., x : x + y=, which can be created with Op("x", "x", Arithmetic.ADD, "y")

    Arithmetic contains ADD, SUBTRACT, MULTIPLY, and DIVIDE for the operators.

  4. Goto, e.g., goto 3, which can be created with Goto(3).
  5. IfGoto, e.g., if x < 0 goto 7, which can be created with IfGoto("x", Relational.LESSTHAN, 7)

    Recall that IfGoto always compares to 0 as the right operand. Relational contains either LESSTHAN or EQUALS (which can be used to implement the remaining relational operators from the WHILE language.)

Backpatching branch targets for control-flow constructs

When emitting branches that jump forward in the set of instructions, we don't easily know the target program counter until after we have generated all instructions preceding the target. Backpatching is a technique where we generate the branch, saving a pointer to instruction's class. Then, we generate the subsequent instructions, keeping track of the program counter. Finally, once we have reached the target instruction, we fill in the branch's program counter using the pointer to the instruction.

See an example of using backpatching in the following implementation of visitNot

def visitNot(self, ctx:WhileParser.NotContext):
    # compile the operand's expression
    operand = self.visit(ctx.b())

    # create a variable for the result
    resultvar = self.prog.temp()

    # test the operand's expression to see if it's zero (false)
    ifgoto = IfGoto(operand, Relational.EQUALTO, None)
    self.prog.add(ifgoto)

    # if not, set the result to false (the negation of not false)
    self.prog.add(Const(resultvar, 0))

    # generate a goto to skip the else branch
    goto = Goto(None)
    self.prog.add(goto)

    # backpatch the test for the expression being false to the pc of
    # the instruction that sets the results for true (negation)
    ifgoto.pc = self.prog.pc()

    # add the instruction that sets the result to true
    self.prog.add(Const(resultvar, 1))

    # backpatch the pc of the goto that skips the else branch
    goto.pc = self.prog.pc()

    assert ifgoto.pc != None
    assert goto.pc != None
    return resultvar

For instance, in visitNot, we generate two forward branches, e.g., in the following example, lines 3 and 5.

1: t0 = 1            # operand = self.visit(ctx.b())
2: t1 = t0           # (ditto)
3: if t1 = 0 goto 6  # generate ifgoto with no pc
4: t2 = 0
5: goto 7            # after adding the goto, update the ifgoto's pc with the ifgoto.pc = self.prog.pc(), which is now 6
6: t2 = 1            # after adding the t2 = 1, update the goto's pc with goto.pc = self.prog.pc(), which is now 7

Author: Paul Gazzillo

Created: 2024-03-01 Fri 07:54

Validate