UP | HOME

(1) Introduction
COP-3402, Spring 2024

Table of Contents

Why study compilers and systems software?

Know your tools

newton_grinding.png

Newton's Notebook on Grinding Lenses

all technical workers need to be expert at their tools. pilots go to flight school, carpenters may even build their own tools.

when studying optics, newton learned to create his own lenses. here is an image from newton's notebook on grinding lenses.

https://cudl.lib.cam.ac.uk/view/MS-ADD-04000/56

Be a better programmer

top_languages_2023.png

IEEE Spectrum's Top Programming Languages

all languages have many things in common. learning how they are implemented gives you insight into how other languages works.

https://spectrum.ieee.org/the-top-programming-languages-2023

Intellectual curiosity

how_things_work.jpg

compilers are just fascinating. they bridge both theoretical computer science and low-level computing, making them a good sample of many fundamental aspects of computer science.

https://www.amazon.com/Random-House-Book-Things-Work/dp/0679809082

About me

  • Paul Gazzillo
  • Assistant professor
  • Research areas
    • Programming languages
    • Software engineering
    • Security
  • https://paulgazzillo.com

What makes compiler writing so hard?

  • (Diagram)

    diagram 1

    file.c -> compiler.exe -> file.exe, in -> file.exe -> out

    see how the compiler itself needs to be compiled: compiler.c -> compiler.exe -> compiler.exe

    diagram 2

    in our class simplec.c -> gcc -> simplec.exe

    from the perspective of writing program, your input and output is code input: file.simplec -> simplec.exe -> output: file.exe

  • The boostrapping problem
    • Can't write a C compiler in C without a C compiler
    • Write the compiler in machine code first

    (optional) generating (components) of compilers automatically: spec -> compiler-compiler.exe -> compiler.c

Course Overview

Syllabus

https://www.cs.ucf.edu/~gazzillo/teaching/cop3402spring24

going to assume you have some proficiency in C, can trace through and figure out what code does (neceesary for debugging) first we'll learn about our programming environment, VM, git, makefiles, gcc -c, linker/libraries, OS/loader then we'll write a simple compiler in C for a calculator program to assembly then we'll dig into how to do this for a larger subset of C

Changes from my last offering (2020 -> 2024)

  • Competency questions for each lecture
    • Will have homework questions each week (graded for effort)
    • Final will have similar questions
  • Staying on topic, fewer tangents, leaving student discussions outside of lectures
    • 1hr lectures, 15min of questions, discussion per session
  • 2020 offering was quite positive
  • two comments for improvements
    • i get easily distracted by questions
    • specific questions at end of course to focus students on important takeaways

Changes from my last offering (2019 -> 2020)

  • More in-class coding and demos
  • Lexer/parser given to you
  • More breathing room to cover complex topics
  • students liked in-class coding, doing more of that
    • toy compiler, code together
    • submit your copied version
  • liked learning systools, particularly git
    • need more instruction on using the command-line (since i expect it)
    • devoting part of course material to it
  • students did not like lack of detail on theoretical concepts

Course mechanics

  • Webcourses
  • Course webpage
  • Ed Discussions
  • GitHub

Webcourses

  • Entrypoint to course
    • Links to webpage, Ed Discussions, GitHub
  • Announcements
  • Assignments and grades
    • Written homework submitted on webcourses
    • Code submitted on GitHUb
  • Final exam

Course webpage

  • Syllabus
  • Schedule
  • Lecture notes
  • Project descriptions
  • Homework assignments
  • Videos to recorded lectures (if available)

Ed Discussions

  • Discussion board
  • Your public questions benefit other students
  • Your question may have been answered, so check here!
  • Faster than email

GitHub classrooms

  • Assignment #1: create a git repository in GitHub classroom
    • See webcourses for the registration link
  • Do not put coursework repositories on personal account (neither public nor private)
    • Violates UCF's Golden Rule policies
  • All code submitted via git
    • Version control is standard for software engineers
  • Create a GitHub account today if you haven't already

Ulterior motive: make you a better programmer

Nominally, this course is to teach you systems software and mainly about the internals of the compiler.

But I have a secret objective: train you to be better at the practice of programming

I think compilers is a great way to do this

(IMHO) Great Programmers Know Well

  1. The language
  2. Their programming environment
  3. Some computational theory

Know the language

  • Can't program without it
  • Helps to know language constructs in detail
  • Compilers reveal how the language itself works

C is deceptively simple

Know your programming environment

  • Editor, system software (libraries, linkers, loaders), the OS
  • Helps you be faster, tackle large software
  • Compiler works with system to make an executable
  • We'll learn this next week
  • often ommitted in academia
  • we'll go through all the tools you'll need
  • we'll use the command-line (cli, shell, etc)
  • using cli gives you more control over your computing device
  • when you use the gui it is good for ease of use, but hides system infrastructure (file system, etc)

Know some computational theory

  • Provides new mental models
  • Algorithmic design and implementation
  • Compilers use several models, automata and recursion
  • Not necessarily academic: C has a mental model
    • It's very simple and close to the physical hardware's behavior (RAM, variables refer to them, pointers and memory addresses)
    • Not simple to use, particular for some problems
    • E.g., recursion
  • If you've hated or been confused by recursion
    • We'll look at some other ways to think about this
    • Recursion is hard to reason about
    • Hard to reason about in the RAM mental model
    • Actually has to be "emulated" by the compiler
    • If you struggled with it, this class will hopefully make it easier for you
    • We'll look at it from a data structure POV
      • If you can imagine/draw a tree, you can do recursion!

Three virtues of a great programmer

https://web.archive.org/web/20211014194234/http://threevirtues.com/

"According to Larry Wall(1), the original author of the Perl programming language, there are three great virtues of a programmer; Laziness, Impatience and Hubris

  1. Laziness: The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful and document what you wrote so you don't have to answer so many questions about it.
  2. Impatience: The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to.
  3. Hubris: The quality that makes you write (and maintain) programs that other people won't want to say bad things about."

What's a bug?

What do you think?

What a bug is not (usually)

  • The program does something wrong

What a bug is

  • A program that does what it's supposed to do, but not what we think it does

A bug is the gap between what the programmer thinks it does and what it actually does

An exception: compiler bugs

  • The programming language defines the expected behavior
  • The compiler is the implementation
  • A compiler bug means the program truly does what it's not supposed to
    • Hardware, kernel, system bugs can also cause incorrect program behavior

Debugging your compiler

  1. Narrow down the problem by crafting a smaller version of the test case
  2. Go to the part of the your code that is related to the problem
  3. Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)

Goal: make the specification match the implementation

  • Specification: what we want our program to do
  • Implementation: what our program really does
  • Easier said than done!
  • During implementation, the incomplete implementation never matches the specification!

(Diagram)

  • Specification -> Programmer -> Implementation
    • (Green highlight entire spec, green highlight part of impl, but mostly red)
    • If I try to tackle the whole problem at once, my program is always wrong until I've finished coding the entire specification

Hard way: write whole program, check once done

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2.

Sounds easy. Seems easy at first. Quickly snowballs.

The more you add to your code the more complicated combinations of behavior. It's exponential!

Think about debugging code that has 3 if-then-else statements vs 10.

Lazy way: start with a narrower specification

  • Easier to get the implementation right
  • Gradually expand the specification
  • Keep the implementation correct at each step

I will write a "hello, world!" program as the first step.

top-down vs bottom-up

Constructively lazy (one of the virtues!)

Divide-and-conquer: break the problem down into smaller parts

  • Can't keep all code in mind for all time
  • Make it easier for yourself
  • Delay gratification
    • Finishing code fast feels good
    • Debugging shoddy code feels horrible

(Diagram)

  • Specification -> Programmer -> Implementation
    • (Divide and green highlight part of spec, divide and green highlight part of impl; then take one more piece of spec, green highlight corresponding impl, with a small part being red)
    • But if I divide up the specification, I can focus on getting a simpler, smaller (sub)program done, making debugging easier and it more likely to be correct once I move onto to another part of the specification.
  • Debugging is made simpler, since there are fewer (likely) things that could be wrong

Spend time now, save time debugging later

  • Breaking problems down takes time, experience, and making mistakes
  • Premature generalization can waste work
  • Don't be afraid to refactor
    • Easier to reorganize code after prototyping than to write from scratch (for large programs)

Instead of using cognitive energy to keep the whole program in your head and debug the whole program each time there is an issue, use cognitive energy to break the problem down into simpler parts and reason about how to combine them correctly

With compilers, it's a large project, so we really need to break it down into components.

For your compiler, we have a nice break-down of phases

Within each phase, try to break the problem down further yourself, and make each piece work on its own, then work with other pieces gradually.

Biggest take aways:

  • Real definition of what a bug is
  • Revisit your code
    • Refactor to match specifications (usuallyeasier to refactor than starting from scratch)
    • Take time to make good interfaces
    • Less stressful if breaking down the problem first

Abstractions help you organize code

Use abstractions

  • Hides unnecessary details, exposes clear, simple interfaces
  • Physical examples: cars, locks, sockets
  • Interface decisions determined by use of that abstraction

Would you rather plug your laptop into the wall or twist together positive, negative, and ground every time you need power?

The function abstraction

  • One of the most common and powerful
  • Names a snippet of code
  • Provides a definition of input and output
    • Developer provides documentation of behavior and usage
  • You'll get to implement support for functions in your compiler

Combine abstractions to build large software

  • Abstractions create software components
  • Components can be tested on their own
  • Developer can forget about details of the component
    • As long as the component is thoroughly tested and documented
  • Large systems are composed of many layers of abstraction

Bug wrap-up

  • Bugs are (usually) not wrong programs, but incorrect assumptions
  • Break the problem down into simpler specifications
  • Use abstractions to build larger programs

Wirth’s Stepwise Refinement

Setting up your virtual machine

Virtual machines (VMs) standardize our programming environment

  • Emulates hardware using software
  • Fewer surprises during grading
  • Applications (usually) can't tell the difference
    • Hardware vs software interpreting machine instructions
  • Example: arcade/console emulators

The host OS emulates the hardware of the guest OS

  • Host: your OS running on native hardware
  • Guest: the OS running on emulated hardware (which runs on your host)
  • Hypervisor: manages direct guest access to native hardware

Generic x86 instructions

Note: if you are an on M1/M2/M3/etc. arm-based macbook, see the arm-based-macos-specific set of instructions

  • Install VirtualBox

    Virtualization software

    version 7.0.12

    https://www.virtualbox.org/wiki/Downloads

  • Install Vagrant

    Command-line VM manager

    version 2.4.0

    https://www.vagrantup.com/downloads

  • Create the virtual machine
    1. Download the course Vagrantfile to where you want to work
    2. Open your command-line shell
    3. Navigate to folder
    4. Double-check that the name of the file has no .txt extension

      ls # dir on windows
      # you should see Vagrantfile, not Vagrantfile.txt
      # rename if needed
      mv Vagrantfile.txt Vagrantfile # ren on windows
      
    5. Run vagrant up (this will take some time)
    6. Run vagrant ssh to enter your box
    7. /vagrant is synced to the folder where you put the Vagrantfile (cd /vagrant)
    8. exit or Ctrl-D to leave the ssh session

Installation on Debian/Ubuntu x86

Open your terminal and run the following:

sudo apt install vagrant virtualbox
mkdir cop3402spring24 # create the course directory
cd cop3402spring24 # enter the course directory
wget https://www.cs.ucf.edu/~gazzillo/teaching/cop3402spring24/files/Vagrantfile # get the Vagrantfile
vagrant up # create the virtual machine, this will take some time
vagrant ssh  # enter the virtual machine

Installation for arm-based macs

  1. First, turn on file sharing.
  2. Then, open Terminal, and enter these commands

    setopt interactive_comments
    
    # be sure that file sharing is turned on (see step #1 above)
    
    # install the homebrew package manager https://brew.sh/
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    # enter your macos password if/when prompted
    
    # setup environment for brew
    (echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> $HOME/.zprofile
    eval "$(/opt/homebrew/bin/brew shellenv)"
    
    # install vagrant
    brew install vagrant
    # enter macos password if/when prompted
    
    # install qemu
    brew install qemu
    
    # install the qemu vagrant provider https://github.com/ppggff/vagrant-qemu
    vagrant plugin install vagrant-qemu
    
    # setup and enter the course VM
    cd # go to home directory
    mkdir cop3402spring24 # create the course directory
    cd cop3402spring24 # enter the course directory
    curl https://www.cs.ucf.edu/~gazzillo/teaching/cop3402spring24/files/arm/Vagrantfile > Vagrantfile # get the Vagrantfile
    vagrant up --provider qemu # create the virtual machine
    # enter password when prompted
    # enter both mac username and password again when prompted
    vagrant ssh  # enter the virtual machine
    

Halting the machine

  • Here are two ways
  • When logged into the VM Guest OS: sudo shutdown -h now
  • When in host OS (in Vagrantfile directory): vagrant halt

Re-provisioning

  • If `make`, `gcc`, etc., are not available, use vagrant provision from the host machine
    • This can happen if the provisioning failed, e.g., due to a network issue
  • From host OS (in Vagrantfile directory), and the machine is running (after vagrant up)
    • vagrant provision

Gotchas

  • Some browsers may add a .txt extension to Vagrantfile. Be sure to remove this so that the file in your directoy is just Vagrantfile. Double-check with `ls` (*nix/mac) or `dir` (windows) that the filename does not have the .txt extension.
  • Requires virtualization support enabled in BIOS, SVM for AMD or VT-x for Intel
  • In windows, vagrant in WSL is not supported, use windows command-probmpt
  • Anti-virus software may be intercepting https traffice (MITM), which vagrant rejects
  • VirtualBox has scattered support for arm-based macs. It should be possible to use VMWare instead of VirtualBox by following these directions. These instructions may also help.

Destroying the VM

  • If you want to destroy and recreate the VM
  • Be sure you have nothing saved on the VM
    • Keep everything in your shared folder on your host OS
  • vagrant destroy

Live demo

Author: Paul Gazzillo

Created: 2024-02-16 Fri 00:02

Validate