UP | HOME

Code Generation 1
COP-3402

Table of Contents

Overview

In this project you will compile function definitions and calls to x86 assembly. As with all programming projects, it will be submitted via git.

Setup the repo

  1. ssh into eustis, replacing NID with your UCF NID.

    ssh NID@eustis.eecs.ucf.edu
    
  2. Clone the compiler template to codegen1

    git clone https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/repos/compiler-template.git/ codegen1
    
  3. Enter the repo

    cd codegen1/
    

    If this doesn't, double-check step (2) and make sure you put codegen1 as the second argument to clone.

  4. Add the URL of your personal remote repository, replacing NID with your UCF NID.

    git remote add submission gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen1
    
  5. Synchronize your local repo with the remote eustis3 repo.

    git push --set-upstream submission master
    

    You only need to do this once. Use git commit and git push regularly to keep the remote repo up to date.

Setting up your development environment

  1. Be sure you are in the repo directory.

    cd ~/codegen1
    

    If you receive a warning about a managed python installation, then double-check that you are on eustis.

  2. Then create the development environment. This creates an "editable" installation of your project, so that you can modify its source and rerun without having to reinstall the project.

    pipenv install -e ./
    

    If you can't run pipenv but you've already installed it, trying logging out and back in again to eustis.

    If you haven't installed pipenv yet, please review the calc project.

  3. Enter your pipenv development environment. Do this everytime you log in to eustis to work on your project.

    pipenv shell
    

    Double-check that you are in the environment. Your prompt should look something like this:

    (compiler) NID@net1547:~/codegen1$
    

    You can later exit the dev environment with exit. You do not need to enter the dev environment again if you have already entered it.

  1. Get ANTLR and build the parser.

    make -C grammar/
    

Compiler project structure

File Description
Pipfile pipenv settings
compiler/CodeGen.py The code generator that you will write. Not provided by the template repo.
compiler/Interpreter.py A SimpleIR interpreter for comparing output
grammar/Makefile A build file for the grammar
grammar/SimpleIR.g4 The SimpleIR grammar
pyproject.toml python project settings

.gitignore files are for git and __init__.py are for python modules.

Implementation

Skeleton code

Here is the start to compiler/CodeGen.py

import os
import sys
import math
from textwrap import indent, dedent
from antlr4 import *
from grammar.SimpleIRLexer import SimpleIRLexer
from grammar.SimpleIRParser import SimpleIRParser
from grammar.SimpleIRListener import SimpleIRListener
import logging
logging.basicConfig(level=logging.DEBUG)

# This class defines a complete listener for a parse tree produced by SimpleIRParser.
class CodeGen(SimpleIRListener):
    def __init__(self, filename, outfile):
        self.filename = filename
        self.outfile = outfile
        self.symtab = {}
        self.bytewidth = 8

    def enterUnit(self, ctx:SimpleIRParser.UnitContext):
        """Creates the object file sections"""
        self.outfile.write(
f'''\t.file "{self.filename}"
\t.section .note.GNU-stack,"",@progbits
\t.text
''')

    def enterFunction(self, ctx:SimpleIRParser.FunctionContext):
        """Emits the label and prologue"""
        # TODO

    def exitFunction(self, ctx:SimpleIRParser.FunctionContext):
        """Emits the epilogue"""
        # TODO

    def enterReturn(self, ctx:SimpleIRParser.ReturnContext):
        """Sets the return value"""
        # TODO

    def enterCall(self, ctx:SimpleIRParser.CallContext):
        """Function call"""
        # TODO


def main():
    import sys
    if len(sys.argv) > 1:
        filepath = sys.argv[1]
        input_stream = FileStream(filepath)
        filename = os.path.basename(filepath)
    else:
        input_stream = StdinStream()
        filename = "stdin"

    lexer = SimpleIRLexer(input_stream)
    stream = CommonTokenStream(lexer)
    parser = SimpleIRParser(stream)
    tree = parser.unit()
    if parser.getNumberOfSyntaxErrors() > 0:
        print("syntax errors")
        exit(1)
    # print(tree.toStringTree())
    walker = ParseTreeWalker()
    walker.walk(CodeGen(filename, sys.stdout), tree)

if __name__ == '__main__':
    main()

Emitting assembly code

The CodeGen class provides a self.outfile file to write to. In python, write a string using

self.outfile.write("The string to emit")

Alternatively, you can use a format string to make creating templates easier, where anything inside curly braces is evaluated, e.g., the following prints a string followed by the contents of a variable called name:

self.outfile.write(f"This is the what is in the name variable: {name}")

To retrive ANTLR parse tree contents, use the ctx context parameter provided to each listener using the name of the token, e.g., the following will get the NAME token from the syntax tree for a function production and store it in the name python variable.

name = ctx.NAME()

Laying out the assembly file

enterUnit

This function is given to you. It creates assembly code boilerplate for you.

Defining functions

enterFunction

To create the function, emit the assembly pseudo-ops .globl and .type with the name of the function (ctx.NAME()), as well as a label for the function. Then emit the prologue. Use whatever the name of the function is for all three, e.g., for the function factorial the function creation and prologue would look like this:

        .globl factorial
        .type factorial, @function
factorial:
  # prologue
        pushq	%rbp # save old base ponter
        movq	%rsp, %rbp # set new base pointer
        push	%rbx # %rbx is callee-saved

exitFunction

Emit the assembly function epilogue and return instruction, i.e.,

      # epilogue
pop %rbx # restore rbx for the caller
      mov	%rbp, %rsp # restore old stack pointer
      pop	%rbp # restore old base pointer
      ret

This is the same for all functions.

Calling functions

For this project, you do not need to support parameters, just emit the assembly code to call the function, e.g., the following calls function factorial:

call	factorial

In our SimpleIR ANTLR grammar, the call production has several NAME tokens, i.e., in grammar/SimpleIR.g4

call: NAME ':=' 'call' NAME NAME*;

To collect them into a list in python, do the following:

call = [ name.getText() for name in ctx.NAME() ]
  • call[0] will contain the name of the variable to store the function's return value to
  • call[1] will contain the name of the function to be called
  • call[2:] will contain the names of all the parameters to the call ([2:] is python syntax to slice a list into all elements at and after index 2)

Use the name of the function to generate the call in assembly.

Returning

For this project, you only need to support returning integer constants. The return IR instruction, incidentally, does not call the assembly ret, which has to come after the epilogue. Instead sets the return value to the %rax register, which is the register that holds the return value according to the System V x86 64 ABI.

# set return value
mov	$10, %rax

To get the text of the operand, e.g., return 5 has the operand "5", use ctx.operand.text. Use the operand text to generate an immediate mov that puts the operand into %rax, the register for the return value.

Bringing it all together

# enterFunction generates function declaration, label, and prologue
  .globl main
  .type main, @function
main:
  # prologue, update stack pointer
  pushq	%rbp # save old base ponter
  movq	%rsp, %rbp # set new base pointer
  push	%rbx # %rbx is callee-saved

  # enterCall generates the function call
  call	func

  # enterReturn generates the store to %rax
  mov	$10, %rax

  # exitFunction generates the epilogue and return instruction
  pop %rbx # restore rbx for the caller
  mov	%rbp, %rsp # restore old stack pointer
  pop	%rbp # restore old base pointer
  ret

Testing your compiler

This example compiles two functions, main and func. If successful, you will have two assembly files, main.s and func.s. main calls func and returns the constant 10. While func returns 5, this value is never used by main. (For this project, local variables are not supported yet so the assignment of the return value of call can be ignored. This also means that the Interpreter will not work as expected due to the missing variable.)

codegen << EOT | tee main.s
function main
phonyvar := call func
return 10
EOT
codegen << EOT | tee func.s
function func
return 5
EOT
gcc -o main main.s func.s
./main
echo $? # you should see 10 as the exit code

Debugging with GDB

One way to help trace the function call is to use gdb. The following will rebuild the main program with debugging symbols ;on, run gdb, then step through each assembly instructions.

gcc -g -o main main.s func.s  # compile with debugging symbols (-g)
gdb main 
b main # setup breakpoint at main
r # start running, breaks at main
si # step instruction to see next instruction
# hitting enter will repeat last command, e.g., si
c # one done use c to continue running without stopping

Debugging tutorials

See these resources for more information on gdb.

Full example

main.ir

function main
phonyvar := call func
return 10
  • main.s

    Running codegen main.ir > main.s should produce similar assembly output. Note that # denotes a comment.

            .file "main.ir"
            .section .note.GNU-stack,"",@progbits
            .text
            .globl main
            .type main, @function
    main:
            # prologue, update stack pointer
            pushq	%rbp # save old base ponter
            movq	%rsp, %rbp # set new base pointer
            push	%rbx # %rbx is callee-saved
            call	func
            # set return value
            mov	$10, %rax
            # epilogue
      pop %rbx # restore rbx for the caller
            mov	%rbp, %rsp # restore old stack pointer
            pop	%rbp # restore old base pointer
            ret
    

func.ir

function func
return 5
  • func.s

    Running codegen func.ir > func.s should produce similar assembly output. Note that # denotes a comment.

            .file "func.ir"
            .section .note.GNU-stack,"",@progbits
            .text
            .globl func
            .type func, @function
    func:
            # prologue, update stack pointer
            pushq	%rbp # save old base ponter
            movq	%rsp, %rbp # set new base pointer
            push	%rbx # %rbx is callee-saved
            # set return value
            mov	$5, %rax
            # epilogue
      pop %rbx # restore rbx for the caller
            mov	%rbp, %rsp # restore old stack pointer
            pop	%rbp # restore old base pointer
            ret
    

More full examples

cd ~/codegen1
wget https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/files/compiler-examples.tar
tar -xvf compiler-examples.tar

Submitting your project

Stage, commit, and push to the grading server

The only file you need to submit is compiler/CodeGen.py.

Once you have set up the repo, all you need to do is use git add, git commit, and git push to stage, commit, and sync your repo to the grading git server.

Self-check

You can check that you've submitted correctly by cloning, building, and testing your repo.

cd ~
git clone gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen1 codegen1_new
cd codegen1_new
pipenv install -e ./
pipenv shell
make -C grammar/
codegen << EOT > main.s
function main
return 10
EOT
gcc -o main main.s
./main
echo $? # you should see 10 as the exit code

What should be the correct output, i.e., assignments of the variables, for these arithmetic operations?

(Only if instructed) Updating from the start repo

If the original repo gets updated after you have already started implementing your project, you can get those updates by pulling. Otherwise, you will never need to do this step. Be sure to commit any changes you have made before proceeding.

git pull origin master --rebase
git push -f

If you encounter a conflict, it may be that you modified some files from the original repo that didn't need to be modified. Come to office hours if you need help resolving the conflict.

Troubleshooting

  • If you make a mistake in typing the URL, you can remove the submission remote and try the add step again:

    git remote rm submission
    git remote add submission gitolite3@eustis3.eecs.ucf.edu:cop3402/NID/codegen1  # replace NID with yours
    
  • Do not try creating a new repo if you make a mistake. You will not be able to push the new repo to gitolite3, since there already is one there. You can always make new changes and commit them to fix mistakes.
  • If in self-check codegen1_new already exists, just use a fresh directory name.
  • The program must be run inside of the pipenv environment. You can see that you have successfully entered the environment because your prompt is prefixed with (compiler), e.g.,

    (compiler) NID@net1547:~/codegen1$
    

    You can exit the environment with exit.

Self-grading

cd ~/
git clone https://www.cs.ucf.edu/~gazzillo/teaching/cop3402fall24/repos/codegen1-grading.git
cd codegen1-grading
make

Look at README.md for usage instructions.

Grading schema

Criterion Points
The git repo exists 1
compiler/CodeGen.py exists 1
codegen runs the given example correctly 2
codegen runs new example inputs correctly 2
TOTAL 6

Author: Paul Gazzillo

Created: 2024-11-20 Wed 16:53

Validate