UP | HOME

Introduction
Lecture 1

Table of Contents

Introduction

Why study compilers and systems software?

  • Know your tools
  • Be a better programmer
  • Intellectual curiosity

all technical workers need to be expert at their tools. pilots go to flight school, carpenters may even build their own tools.

all languages have many things in common. learning how they are implemented gives you insight into how other languages works.

compilers are just fascinating. they bridge both theoretical computer science and low-level computing, making them a good sample of many fundamental aspects of computer science.

About me

  • Paul Gazzillo
  • Assistant professor
  • Research areas
    • Programming languages
    • Software engineering
    • Security
  • https://paulgazzillo.com

What makes compiler-writing so hard?

  • (Diagram)

    file.c -> compiler.exe -> file.exe in -> file.exe -> out compiler.c -> compiler.exe -> compiler.exe spec -> compiler-compiler.exe -> compiler.c

  • The boostrapping problem
    • Can't write a C compiler in C without a C compiler
    • Write the compiler in machine code first

Course Overview

Syllabus

https://cop3402fall20.github.io/

going to assume you have some proficiency in C, can trace through and figure out what code does (neceesary for debugging) first we'll do programming environment, VM, git, makefiles, gcc -c, linker/libraries, OS/loader then we'll write a simple compiler from scratch in C for a calculator program to assembly then we'll dig into how to do this for a larger subset of C

Changes from my last offering

  • More in-class coding and demos
  • Lexer/parser given to you
  • More breathing room to cover complex topics
  • students liked in-class coding, doing more of that
    • toy compiler, code together
    • submit your copied version
  • liked learning systools, particularly git
    • need more instruction on using the command-line (since i expect it)
    • devoting part of course material to it
  • students did not like lack of detail on theoretical concepts

Course mechanics

  • Webcourses
  • Course webpage
  • Zoom
  • Gather
  • Piazza
  • GitHub

Webcourses

  • Entrypoint to course
    • Links to webpage, Zoom, Piazza, GitHub
  • Announcements
  • Assignments and grades
    • Written homework submitted on webcourses
    • Code submitted on GitHUb
  • Final exam

Course webpage

  • Syllabus
  • Schedule
  • Lecture notes
  • Project descriptions

Zoom

  • Lecture broadcast
  • Use chat to ask questions
    • Recall that voice/video are being recorded
  • Course recordings posted within a week
    • Courses may be publicly available

Gather

  • Office hours
  • Group chat in a virtual space

Piazza

  • Discussion board
  • Question may be public or private (instructors only)
  • Your public questions benefit other students
  • Your question may have been answered already on Piazza

GitHub classrooms

  • Assignment #1: create a git repository in GitHub classroom
    • See webcourses for the registration link
  • Do not put coursework repositories on personal account (neither public nor private)
    • Violates UCF's Golden Rule policies
  • All code submitted via git
    • Version control is standard for software engineers
  • Create a GitHub account today if you haven't already

Submit your first "assignment"

Due today for academic activity requirement

(Demo)

Ulterior motive: make you a better programmer

Nominally, this course is to teach you systems software and mainly about the internals of the compiler.

But I have a secret objective: train you to be better at the practice of programming

I think compilers is a great way to do this

(IMHO) Great Programmers Know Well

  1. The language
  2. Their programming environment
  3. Some computational theory

Know the language

  • Can't program without it
  • Helps to know language constructs in detail
  • Compilers reveal how the language itself works

C is deceptively simple

Know your programming environment

  • Editor, system software (libraries, linkers, loaders), the OS
  • Helps you be faster, tackle large software
  • Compiler works with system to make an executable
  • We'll learn this next week
  • often ommitted in academia
  • we'll go through all the tools you'll need
  • we'll use the command-line (cli, shell, etc)
  • using cli gives you more control over your computing device
  • when you use the gui it is good for ease of use, but hides system infrastructure (file system, etc)

Know some computational theory

  • Provides new mental models
  • Algorithmic design and implementation
  • Compilers use several models, automata and recursion
  • Not necessarily academic: C has a mental model
    • It's very simple and close to the physical hardware's behavior (RAM, variables refer to them, pointers and memory addresses)
    • Not simple to use, particular for some problems
    • E.g., recursion
  • If you've hated or been confused by recursion
    • We'll look at some other ways to think about this
    • Recursion is hard to reason about
    • Hard to reason about in the RAM mental model
    • Actually has to be "emulated" by the compiler
    • If you struggled with it, this class will hopefully make it easier for you
    • We'll look at it from a data structure POV
      • If you can imagine/draw a tree, you can do recursion!

Three virtues of a great programmer

http://threevirtues.com/

"According to Larry Wall(1), the original author of the Perl programming language, there are three great virtues of a programmer; Laziness, Impatience and Hubris

  1. Laziness: The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful and document what you wrote so you don't have to answer so many questions about it.
  2. Impatience: The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to.
  3. Hubris: The quality that makes you write (and maintain) programs that other people won't want to say bad things about."

What's a bug?

What do you think?

What a bug is not (usually)

  • The program does something wrong

What a bug is

  • A program that does what it's supposed to do, but not what we think it does

A bug is the gap between what the programmer thinks it does and what it actually does

An exception: compiler bugs

  • The programming language defines the expected behavior
  • The compiler is the implementation
  • A compiler bug means the program truly does what it's not supposed to
    • Hardware, kernel, system bugs can also cause incorrect program behavior

Debugging your compiler

  1. Narrow down the problem by crafting a smaller version of the test case
  2. Go to the part of the your code that is related to the problem
  3. Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)

Specification and implementation

  • Specification: what we want our program to do
  • Implementation: what our program really does

Strive to make implementation match specification

  • Easier said than done!
  • During implementation, the incomplete implementation never matches the specification!

Hard: write the whole program, check when done

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2.

Sounds easy. Seems easy at first. Quickly snowballs.

The more you add to your code the more complicated combinations of behavior. It's exponential!

Think about debugging code that has 3 if-then-else statements vs 10.

Let's be lazy: start with a narrower specification

  • Easier to get the implementation right
  • Gradually expand the specification
  • Keep the implementation correct at each step

I will write a "hello, world!" program as the first step.

top-down vs bottom-up

Divide-and-conquer: break the problem down into smaller parts

  • Finding the right way to divide takes experience, time, and making mistakes
  • Premature generalization can waste work
  • Don't be afraid to refactor
    • Easier to reorganize code after prototyping than to write from scratch

With compilers, it's a large project, so we really need to break it down into components.

For your compiler, we have a nice break-down of phases

Within each phase, try to break the problem down further yourself, and make each piece work on its own, then work with other pieces gradually.

Use abstractions

  • Hides unnecessary details, exposes clear, simple interfaces
  • Physical examples: cars, locks, sockets
  • Interface decisions determined by use of that abstraction

The function abstraction

  • One of the most common and powerful
  • Names a snippet of code
  • Provides a definition of input and output
    • Developer provides documentation of behavior and usage
  • You'll get to implement support for functions in your compiler

Combine abstractions to build large software

  • Abstractions create software components
  • Components can be tested on their own
  • Developer can forget about details of the component
    • As long as the component is thoroughly tested and documented
  • Large systems are composed of many layers of abstraction

Bug wrap-up

  • Bugs are (usually) not wrong programs, but incorrect assumptions
  • Break the problem down into simpler specifications
  • Use abstractions to build larger programs

Setting up your virtual machine

Virtual machines (VMs) standardize our programming environment

  • Emulates hardware using software
  • Fewer surprises during grading
  • Applications (usually) can't tell the difference
    • Hardware vs software interpreting machine instructions
  • Example: arcade/console emulators

The host OS emulates the hardware of the guest OS

  • Host: your OS running on native hardware
  • Guest: the OS running on emulated hardware (which runs on your host)
  • Hypervisor: manages direct guest access to native hardware

Install VirtualBox

Virtualization software

version 6.1.12

https://www.virtualbox.org/wiki/Downloads

Install Vagrant

Command-line VM manager

version 2.2.9

https://www.vagrantup.com/downloads

Installation

  1. Download the course Vagrantfile to where you want to work
  2. Open your command-line shell
  3. Navigate to folder
  4. Run vagrant up
  5. Run vagrant ssh to enter your box
  6. /vagrant is synced to the folder where you put the Vagrantfile (cd /vagrant)
  7. exit or Ctrl-D to leave the ssh session

Halting the machine

  • Here are two ways
  • When logged into the VM Guest OS: sudo shutdown -h now
  • When in host OS (in Vagrantfile directory): vagrant halt

Re-provisioning

  • Sometimes I may update the Vagrant file
  • From host OS (in Vagrantfile directory), and the machine is running (after vagrant up)
    • vagrant provision

Gotchas

  • Requires virtualization support enabled in BIOS, SVM for AMD or VT-x for Intel
  • In windows, vagrat in WSL is not supported, use windows command-probmpt
  • Anti-virus software may be intercepting https traffice (MITM), which vagrant rejects

Destroying the VM

  • If you want to destroy and recreate the VM
  • Be sure you have nothing saved on the VM
    • Keep everything in your shared folder on your host OS
  • vagrant destroy

Live demo

Author: Paul Gazzillo

Created: 2020-09-18 Fri 01:40