UP | HOME

Machine code generation (CodeGen)
Compiler Project

Table of Contents

Translating blocks

Here are suggestions for translating each While3Addr instructions. These suggestions assume simple management of registers, where each use of a variable always stores and/or loads to/from memory.

  • Const can use movq to store an immediate value into the memory location associated with the symbol, e.g., for x := 2 where x is offset -8 from the base pointer:

    movq    $2, -8(%rbp)
    

    will store the value 2 to the memory address %rpb - 8

  • Assign can use two moves to load and then store one variable to another, e.g., for x := y, where x' is offset -8 an y's is -16:

    movq    -8(%rbp), %rax
    movq    %rax, -16(%rbp)
    
    • Op can work by loading the values of both operands into registers, calling the corresponding instruction for the arithmetic operation, then storing the resulting value into the address associated with the assigned symbol.

      For example, a translation of _t6 := outparam * inparam can be

      mov	-64(%rbp), %rax
                              mov	-72(%rbp), %rbx
                              imul	%rbx, %rax
                              mov	%rax, -8(%rbp)
      

      Integer addition, subtraction, and multiplication are add, sub, and imul respectively. Division is slightly different in that it uses two registers and always store the results in predefined registers, which is %rax for the division result. Here is an example translation of _t1 := inparam / _t0, where cdq is called first and idiv always stores the division result in %rax.

      mov	-32(%rbp), %rax
      mov	-8(%rbp), %rbx
                        cdq
                        idiv	%rbx
                              mov	%rax, -16(%rbp)
      
    • Nops do nothing and branching are handled below.

Translating control flow

To translate control flow, we emit a label for each basic block, and emit any branches at the end of them. Labels in assembly are a name followed by a colon preceding an instruction, e.g.,

.BB2:
        movq	$1, -48(%rbp)

Unconditional branches, as used in loops, are done with jmp, which can take a label, e.g.,

jmp	.BB2

Conditional branches work by checking the state of the status register, which records several effects of the preceding instruction, e.g., whether the result of the operation was zero. We can combine two instructions, one to set the status and one to branch, to implement a conditional branch. We'll use cmp which performs a subtraction (without saving the result) and either jl or je to branch when less-than or branch when equal-to, e.g.,

cmp	$0, %rax
jl	.BB4

Here is the translation of the one basic block from and in-class example.

2: _t1 := 1
3: _t3 := _t1 - inparam
4: if _t3 < 0 goto 7
.BB2:
        movq	$1, -48(%rbp)
        mov	-48(%rbp), %rax
        mov	-72(%rbp), %rbx
        sub	%rbx, %rax
        mov	%rax, -40(%rbp)
        mov	-40(%rbp), %rax
        cmp	$0, %rax
        jl	.BB4
        jmp	.BB3

Implementation

Labels can be emitted with the following function called label:

label = lambda node: f'.BB{node}'

The successor block for a node can be accessed from the cfg with, i.e., cfg[node], e.g, the following will check if a basic block has an unconditional branch.

elif len(cfg[node]) == 1: # unconditional branch block

A conditional branch's success can be found with the following:

elif len(cfg[node]) == 2: # conditional branch block
  # we will replace the last instruction with an ifgoto and add a goto
  truesuccessor = list(cfg[node])[0] if cfg[node][list(cfg[node])[0]]['condition'] else list(cfg[node])[1]
  falsesuccessor = list(cfg[node])[0] if not cfg[node][list(cfg[node])[0]]['condition'] else list(cfg[node])[1]
  testinstr = instrs[-1]
  instrs = instrs[:-1]

Skeleton code for codegen

def codegen(filename, outfile, cfg):
  # emit assembly header and pseudo ops for the main function definition
  outfile.write(
f'''\t.file "{filename}"
\t.text
\t.globl main
\t.type main, @function
main:
\t# prologue, save old base pointer, update stack pointer
\tpushq\t%rbp
\tmovq\t%rsp, %rbp
''')

  # create symtab to hold stack offset and current register (include inparam and outparam as first two)
  symbols = set()
  for node in cfg.nodes:
    for instr in cfg.nodes[node]['instrs']:
      match instr:
        case Const(var, _):
          symbols.add(var)
        case Assign(var, fromvar):
          symbols.add(var)
          symbols.add(fromvar)
        case Op(var, left, _, right):
          symbols.add(var)
          symbols.add(left)
          symbols.add(right)
        case Test(var, _):
          symbols.add(var)

  # put in/out variables first
  iovars = [ util.input_variable, util.output_variable ]
  symbols = iovars + list(symbols - set(iovars))

  logging.debug(symbols)

  # generate offsets, n * -8, (n - 1) * -8, ..., 2 * -8, 1 * -8
  offsets = map(lambda x: (x+1)*-8, reversed(range(len(symbols))))

  # create symtab to store offsets
  symtab = dict(zip(symbols, offsets))

  logging.debug(symtab)

  # make space for the temps, plus one for the base pointer, which is
  # at the current stack pointer.  8 bytes each slot in the stack
  # align stack on 16-byte boundary (you might get segfaults calling
  # other libraries without this!)
  stackspace = (len(symtab.keys()) + 1) * 8
  stackspace = ((int(stackspace / 16)) + 1) * 16;

  # count number of locals
  outfile.write(
f'''\t# allocate space for stack local variables
\tsubq\t${stackspace}, %rsp
\t# prepare args to input_int64_t
\tmovl \t$0, %eax
\t# call returns to %rax
\tcall\tinput_int64_t@PLT
\t# save return value to inparam
\tmovq\t%rax, {symtab[util.input_variable]}(%rbp)
''')

  # ensure exit node is last
  entrynode = 0
  exitnode = len(cfg.nodes) - 1

  # TODO: optimization opportunity with topo sort
  sortednodes = sorted(list(set(cfg.nodes) - {entrynode, exitnode}))
  sortednodes = [entrynode] + sortednodes + [exitnode]

  logging.debug(sortednodes)

  # PROJECT: implement code generation, translating instructions within each block and setting up the labels and branches between blocks

  outfile.write(
f'''\t# call to output_int64_t
\t# get outparam
\tmovq\t{symtab[util.output_variable]}(%rbp), %rax
\t# %rdi is the first function argument
\tmovq\t%rax, %rdi
\tcall\toutput_int64_t@PLT
n\t# set main's return value
\tmovl\t$0, %eax
\t# epilogue, restore stack pointer, restore old base pointer
\tmov\t%rbp, %rsp
\tpop\t%rbp
\tret
''')

Author: Paul Gazzillo

Created: 2024-03-11 Mon 14:53

Validate