Code Generation 4
COP-3402

Overview
Setup the repo
Setting up your development environment
Compiler project structure
Implementation
Debugging with GDB
Testing without codegen1 and codegen2
Submitting your project
(Only if instructed) Updating from the start repo
Troubleshooting
Self-grading
Grading schema

Overview

In this project you will compile pointer operations to x86 assembly. As with all programming projects, it will be submitted via git.

Setup the repo

ssh into eustis, replacing NID with your UCF NID.
```
ssh NID@eustis.eecs.ucf.edu
```

Clone the compiler template to codegen4

git clone https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/repos/compiler-template.git/ codegen4

Enter the repo
```
cd codegen4/
```
If this doesn't, double-check step (2) and make sure you put codegen4 as the second argument to clone.

Add the URL of your personal remote repository, replacing NID with your UCF NID.

git remote add submission gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen4

Synchronize your local repo with the remote eustis3 repo.
```
git push --set-upstream submission master
```
You only need to do this once. Use git commit and git push regularly to keep the remote repo up to date.

Setting up your development environment

Be sure you are in the repo directory.
```
cd ~/codegen4
```
If you receive a warning about a managed python installation, then double-check that you are on eustis.
Then create the development environment. This creates an "editable" installation of your project, so that you can modify its source and rerun without having to reinstall the project.
```
pipenv install -e ./
```
If you can't run pipenv but you've already installed it, trying logging out and back in again to eustis.

If you haven't installed pipenv yet, please review the calc project.
Enter your pipenv development environment. Do this everytime you log in to eustis to work on your project.
```
pipenv shell
```
Double-check that you are in the environment. Your prompt should look something like this:
```
(compiler) NID@net1547:~/codegen4$
```
You can later exit the dev environment with exit. You do not need to enter the dev environment again if you have already entered it.

Get ANTLR and build the parser.
```
make -C grammar/
```

Compiler project structure

File	Description
Pipfile	pipenv settings
compiler/CodeGen.py	The code generator that you will write. Not provided by the template repo.
compiler/Interpreter.py	A SimpleIR interpreter for comparing output
grammar/Makefile	A build file for the grammar
grammar/SimpleIR.g4	The SimpleIR grammar
pyproject.toml	python project settings

.gitignore files are for git and __init__.py are for python modules.

Implementation

Skeleton code

Here is the start to compiler/CodeGen.py

import os
import sys
import math
from textwrap import indent, dedent
from antlr4 import *
from grammar.SimpleIRLexer import SimpleIRLexer
from grammar.SimpleIRParser import SimpleIRParser
from grammar.SimpleIRListener import SimpleIRListener
import logging
logging.basicConfig(level=logging.DEBUG)

# This class defines a complete listener for a parse tree produced by SimpleIRParser.
class CodeGen(SimpleIRListener):
    def __init__(self, filename, outfile):
        self.filename = filename
        self.outfile = outfile
        self.symtab = {}
        self.bitwidth = 8

    def enterLocalvars(self, ctx:SimpleIRParser.LocalvarsContext):
        """Allocates space for local variables, including input parameters"""
        # get list of local variables
        locals = [ local.getText() for local in ctx.NAME() ]
        # create a list of offsets of the bitwidth
        offsets = map(lambda x: (x+1)*-self.bitwidth, range(len(locals)))
        # create a dictionary mapping locals to offsets
        self.symtab = dict(zip(locals, offsets))
        logging.debug(self.symtab)
        # compute the size of the stack space
        stackspace = len(self.symtab.keys()) * self.bitwidth
        logging.debug(stackspace)
        # ceiling to 8 bytes
        stackoffset = math.ceil(stackspace / 8) * 8
        # align to 16 bytes by adding 8 to account for rbx if already aligned to 16 bytes
        stackoffset += (stackoffset + 8) % 16
        logging.debug(stackoffset)
        # Emit an instruction to allocate stack space for locals
        self.outfile.write(indent(dedent(f'''\
            # allocate stack space for locals
            sub\t${stackoffset}, %rsp
        '''), '\t'))

    def enterAssign(self, ctx:SimpleIRParser.AssignContext):
        """Assign value to a variable"""
        if SimpleIRParser.NAME == ctx.operand.type:
            operand = f"{self.symtab[ctx.operand.text]}(%rbp)"
        elif SimpleIRParser.NUM == ctx.operand.type:
            operand = f"${ctx.operand.text}"
        self.outfile.write(indent(dedent(f'''\
            # assign {ctx.operand.text} to {ctx.NAME(0).getText()}
            mov\t{operand}, %rax
            mov\t%rax, {self.symtab[ctx.NAME(0).getText()]}(%rbp)
        '''), '\t'))

    def enterDeref(self, ctx:SimpleIRParser.DerefContext):
        """Derefence a variable and assign is value"""
        # TODO
        pass

    def enterRef(self, ctx:SimpleIRParser.RefContext):
        """Get the address of a varaible"""
        # TODO
        pass

    def enterAssignderef(self, ctx:SimpleIRParser.AssignderefContext):
        """Assign to a dereferenced variable"""
        # TODO
        pass

def main():
    import sys
    if len(sys.argv) > 1:
        filepath = sys.argv[1]
        input_stream = FileStream(filepath)
        filename = os.path.basename(filepath)
    else:
        input_stream = StdinStream()
        filename = "stdin"

    lexer = SimpleIRLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = SimpleIRParser(stream)
    tree = parser.unit()
    if parser.getNumberOfSyntaxErrors() > 0:
        print("syntax errors")
        exit(1)
    # print(tree.toStringTree())
    walker = ParseTreeWalker()
    walker.walk(CodeGen(filename, sys.stdout), tree)

if __name__ == '__main__':
    main()

Emitting assembly code

The CodeGen class provides a self.outfile file to write to. In python, write a string using

self.outfile.write("The string to emit")

Alternatively, you can use a format string to make creating templates easier, where anything inside curly braces is evaluated, e.g., the following prints a string followed by the contents of a variable called name:

self.outfile.write(f"This is the what is in the name variable: {name}")

To retrive ANTLR parse tree contents, use the ctx context parameter provided to each listener using the name of the token, e.g., the following will get the NAME token from the syntax tree for a function production and store it in the name python variable.

name = ctx.NAME()

`enterDeref`

Given a derefence statement, e.g.,

t2 := *t1

where the variables have the following offsets

Variable	Offset
t1	-40
t2	-48

the following assembly code will load the value at the addressed stored in t1 and store that value into t2

mov -40(%rbp), %rax  # value of the deref'ed variable (t1)
mov (%rax), %rbx  # the value in memory at address %rax (*t1)
mov %rbx, -48(%rbp)  # store the result in the assigned var (t2)

Recall that the offset of variables is stored in self.symtab.

`enterRef`

Given a reference statement, e.g,.

t1 := &x

where the variables have the following offsets from the base pointer

Variable	Offset
x	-24
t1	-40

the following assembly code will store the address of x into t1

mov %rbp, %rax  # start with stack frame address
add $-24, %rax  # add the offset of the referenced var (x)
mov %rax, -40(%rbp)  # store the result in the assigned var (t1)

Recall that the offset of variables is stored in self.symtab.

`enterAssignderef`

Given a dereference assignment statement

*t1 := 11

where t1 has offset -40, the following assembly will store the value 11 into the address given by t1

mov	-40(%rbp), %rax # get the value of ~t1~
mov	$11, %rbx # get the right-hand-side value
mov	%rbx, (%rax) # store %rbx into the address in memory given by %rax

(%rax) means store not to %rax itself, but to the memory location given by the address in %rax.

Recall that the offset of variables is stored in self.symtab.

Debugging with GDB

One way to help trace the function call is to use gdb. The following will rebuild the main program with debugging symbols on, run gdb, then step through each assembly instructions.

gcc -g -o main main.s func.s  # compile with debugging symbols (-g)
gdb main 
b main # setup breakpoint at main
r # start running, breaks at main
si # step instruction to see next instruction
# hitting enter will repeat last command, e.g., si
c # one done use c to continue running without stopping

Debugging tutorials

See these resources for more information on gdb.

Basic Assembler Debugging with GDB
Chapter 2 of 21st Century C

Full examples

`main.ir`

function main
localvars a b x y t1 t2 t3
params a b
x := 8
y := 9
t1 := &x
t2 := *t1
*t1 := 11
t3 := x
return x

main.s

        .file "main.ir"
        .section .note.GNU-stack,"",@progbits
        .text
        .globl main
        .type main, @function
main:
        # prologue, update stack pointer
        pushq	%rbp # save old base ponter
        movq	%rsp, %rbp # set new base pointer
        push	%rbx # %rbx is callee-saved
        # allocate stack space for locals
        sub	$56, %rsp
        # move register parameter a to local variable
        mov	%rdi, -8(%rbp)
        # move register parameter b to local variable
        mov	%rsi, -16(%rbp)
        # assign 8 to x
        mov	$8, %rax
        mov	%rax, -24(%rbp)
        # assign 9 to y
        mov	$9, %rax
        mov	%rax, -32(%rbp)
        # ref x to t1
        mov	%rbp, %rax
        add	$-24, %rax
        mov	%rax, -40(%rbp)
        # deref t1 to t2
        mov	-40(%rbp), %rax
        mov	(%rax), %rbx
        mov	%rbx, -48(%rbp)
        # assignderef 11 to t1
        mov	-40(%rbp), %rax
        mov	$11, %rbx
        mov	%rbx, (%rax)
        # assign x to t3
        mov	-24(%rbp), %rax
        mov	%rax, -56(%rbp)
        # set return value
        mov	-24(%rbp), %rax
        # epilogue
        pop	%rbx # restore rbx for the caller
        mov	%rbp, %rsp # restore old stack pointer
        pop	%rbp # restore old base pointer
        ret

More full examples

cd ~/codegen4
wget https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/files/compiler-examples.tar
tar -xvf compiler-examples.tar

Testing without `codegen1` and `codegen2`

You can still test your project in isolation without relying on codegen1 and codegen2's implementation of functions. The following will download a special wrapper for your assembly output. It will put your assembly output inside of a pre-defined assembly function so that you can still run it.

Download the wrapper

Download the wrapper here.

cd ~/codegen4
wget https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/files/compiler-wrapper.tar
tar -xvf compiler-wrapper.tar

Files provided in the wrapper

File	Description
iolib.c	Integer read and print functions
main.ir	Template program
Makefile	Makefile to wrap your compiler output
wrapper_end.s	Assembly template end wrapper
wrapper_start.s	Assembly template start wrapper

How to use the wrapper

Be sure your compiler only defines the functions given in the template (enterLocalvars, enterAssign, enterOperation, enterLabel, enterGoto, enterIfgoto). Do not define enterUnit, enterFunction, exitFunction, or enterCall and just omit them from your CodeeGen.py, since code from these functions are pregenerated in the wrapper template for you.

Function	Define in CodeGen.py?
enterLocalvars	Yes
enterAssign	Yes
enterOperation	Yes
enterLabel	Yes
enterGoto	Yes
enterIfgoto	Yes
enterUnit	No
enterFunction	No
exitFunction	No
enterCall	No

Enter your code to compile inside of the given main.ir where it says "add your code to try out here". Do not change any other lines of the program.

function main # DO NOT CHANGE
localvars a b c d e f g result retval # DO NOT CHANGE
# add your code to try out here
retval := call print_int result # DO NOT CHANGE
return 0 # DO NOT CHANGE

Be sure to assign result to your desired output, since this will be printed by print_int. Here is an example use of the wrapper:

function main # DO NOT CHANGE
localvars a b c d e f g result retval # DO NOT CHANGE
# add your code to try out here
a := 8
b := 9
c := &a
d := *c
*c := 11
e := a
result := e
retval := call print_int result # DO NOT CHANGE
return 0 # DO NOT CHANGE

Compile main.ir with the wrapper using the following (be sure you have downloaded it already and that you are in your pipenv shell):

cd ~/codegen4
make -C wrapper clean all

This will run codegen and add the resulting assembly to a template assembly file, the assembly into a program called wrapper.elf. If compilation fails you will see error output, such as

main.s: Assembler messages:
main.s:16: Error: symbol `main' is already defined
make: *** [Makefile:18: main.o] Error 1

Your error will vary depending on what went wrong.

If compilation and assembly were successful, you can run the program with

./wrapper/wrapper.elf

It will print whatever was set to result in your main.ir.

How the wrapper works

The wrapper contains the precompiled main.ir split into wrapper_start.s and wrapper_end.s. These contains the assembly code produced by a complete implementation of codegen2 for the function, call, and return IR statements.

The Makefile runs your codegen and saves the output to main.s, i.e.,

codegen main.ir | tee main.s.body

Then it "wraps" the output of your compiler with the start and end of the precompiled assembly file to produce a complete main.s assembly program, i.e.,

cat wrapper_start.s main.s.body wrapper_end.s > main.s

Finally, it assembles main.s as usual and links it with iolib.c, which provides the print_int function, into the executable wrapper.elf:

gcc -c main.s
gcc -o wrapper.elf main.o iolib.c

Submitting your project

Stage, commit, and push to the grading server

The only file you need to submit is compiler/CodeGen.py.

Once you have set up the repo, all you need to do is use git add, git commit, and git push to stage, commit, and sync your repo to the grading git server.

Self-check

Be sure you have already downloaded the test wrapper.

You can check that you've submitted correctly by cloning, building, and testing your repo.

cd ~
git clone gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen4 codegen4_new
cd codegen4_new
pipenv install -e ./
pipenv shell
make -C grammar/
codegen << EOT > main.s.body
function main # DO NOT CHANGE
localvars a b c d e f g result retval # DO NOT CHANGE
a := 8
b := 9
c := &a
d := *c
*c := 11
e := a
result := e
retval := call print_int result # DO NOT CHANGE
return 0 # DO NOT CHANGE
EOT
cat wrapper/wrapper_start.s main.s.body wrapper/wrapper_end.s > main.s
gcc -o main main.s wrapper/iolib.c
./main # you should see 11 printed out

Note that this is not comprehensive test of your compiler, but just a validation that the project has been submitted and runs for one test case.

(Only if instructed) Updating from the start repo

If the original repo gets updated after you have already started implementing your project, you can get those updates by pulling. Otherwise, you will never need to do this step. Be sure to commit any changes you have made before proceeding.

git pull origin master --rebase
git push -f

If you encounter a conflict, it may be that you modified some files from the original repo that didn't need to be modified. Come to office hours if you need help resolving the conflict.

Troubleshooting

If you make a mistake in typing the URL, you can remove the submission remote and try the add step again:

git remote rm submission
git remote add submission gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen4  # replace NID with yours

Do not try creating a new repo if you make a mistake. You will not be able to push the new repo to gitolite3, since there already is one there. You can always make new changes and commit them to fix mistakes.
If in self-check codegen4_new already exists, just use a fresh directory name.
The program must be run inside of the pipenv environment. You can see that you have successfully entered the environment because your prompt is prefixed with (compiler), e.g.,
```
(compiler) NID@net1547:~/codegen4$
```
You can exit the environment with exit.

Self-grading

cd ~/
git clone https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/repos/codegen4-grading.git
cd codegen4-grading
make

Look at README.md for usage instructions.

Grading schema

Criterion	Points
The git repo exists	1
`compiler/CodeGen.py` exists	1
codegen runs the given example correctly	2
codegen runs new example inputs correctly	2
TOTAL	6

Code Generation 4 COP-3402

Table of Contents

Overview

Setup the repo

Setting up your development environment

Compiler project structure

Implementation

Skeleton code

Emitting assembly code

enterDeref

enterRef

enterAssignderef

Debugging with GDB

Debugging tutorials

Full examples

main.ir

More full examples

Testing without codegen1 and codegen2

Download the wrapper

Files provided in the wrapper

How to use the wrapper

How the wrapper works

Submitting your project

Stage, commit, and push to the grading server

Self-check

(Only if instructed) Updating from the start repo

Troubleshooting

Self-grading

Grading schema

Code Generation 4
COP-3402

`enterDeref`

`enterRef`

`enterAssignderef`

`main.ir`

Testing without `codegen1` and `codegen2`