Building, Testing, and Debugging
Programming Environment
COP-3402

Building software
Separate compilation
Build-system-related vulnerabilities
- SolarWinds
- libxz
Build automation
What's a bug?
Goal: make the specification match the implementation
Abstractions help you organize code

Building software

How do you build a C program?

gcc -o hello hello.c

What if I have two C files?

gcc -o main main.c hello.c

What if I have 30,000+ C files?

gcc -o vmlinux kernel/locking/mutex-debug.c \
  kernel/locking/rwsem.c kernel/locking/rtmutex.c \
  kernel/locking/qrwlock.c kernel/locking/irqflag-debug.c \
  kernel/locking/test-ww_mutex.c kernel/locking/mutex.c \
  # 30,000+ more

Why not build the whole program at once?

git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
find kernel/ arch/ block/ crypto/ drivers/ fs/ init/ ipc/ lib/ mm net/ security/ sound/ virt/ -name "*.c" | wc -l

Downsides of building all at once

Long compilation time
Small changes require complete recompile

Intentionally organize your code into separate parts
Learn this in software engineering
Projects won't be that large in this class
I'll give you the organization for the compiler project

Separate compilation

Divide program into separate C files
Compile individually
Link compiled files

This is how C/C++ do separate compilation, but other languages, like Java, further incorporate language abstractions with source code organization, such as Java's use of one-class per file.

Diagram

main.c     -> gcc -> main.o
square.c   -> gcc -> square.o
exponent.c -> gcc -> exponent.o
                        v
                       ld
                        v
                       main

Example

main.c

#include <stdio.h>

int square(int);

int main() {
  printf("%d\n", square(3));
}

square.c

int exponent(int x, int y);

int square(int x) {
  return exponent(x, 2);
}

exponent.c

int exponent(int x, int y) {
  int result = 1;
  for (; y > 0; result *= x, y--);
  return result;
}

How do we separately compile this program?

gcc -c main.c

-c compiles the .c file to an object file .o

Object files are translations of C to machine code.

Without -c, gcc will also link the C runtime and standard libraries, which is why trying to compile without main will throw a linker error.

We will cover how source code becomes a program and and running process later in the semester.

Compiling our example

Compile each .c file (without linking) to a .o object file

gcc -c main.c  # -c flag produces .o files
gcc -c square.c
gcc -c exponent.c

Then link all object files into a single program binary

gcc -o main main.o square.o exponent.o  # pass .o files

Notice gcc takes the .o files instead of

gcc looks at file extensions to distinguish source code from object files.

gcc will run linking for you, but you can also run ld the linker yourself. We will cover this more later in the semester.

Incremental builds

Only rebuild source files that changed.

emacs main.c   # modify one source file
gcc -c main.c  # recompile it

Re-link with existing object files

gcc -o main main.o square.o exponent.o

Build-system-related vulnerabilities

SolarWinds

libxz

Build automation

Makefiles

target … : prerequisites …
        recipe
        …
        …

https://www.gnu.org/software/make/manual/make.html#Rule-Introduction

Basic makefile

main:
        gcc -o main main.c square.c exponent.c

Won't rebuild if C file is changed.

Basic with clean

main:
        gcc -o main main.c square.c exponent.c

clean:
        rm -f main

Dependencies

main: main.c square.c exponent.c
        gcc -o main main.c square.c exponent.c

Will rebuild if C file is changed, but everything is recompiled.

Incremental build

main: main.o square.o exponent.o
        gcc -o main main.o square.o exponent.o

main.o: main.c
        gcc -c main.c

square.o: square.c
        gcc -c square.c

exponent.o: exponent.c
        gcc -c exponent.c

Only rebuilds the C file that changed!

make
touch main.c
make

Diagram

Wildcard patterns

main: main.o square.o exponent.o
        gcc -o main main.o square.o exponent.o

%.o: %.c
        gcc -c $<

Variables

PROG := main
SRC = main.c square.c exponent.c
OBJ = $(SRC:%.c=%.o)

$(PROG): $(OBJ)
        $(CC) $(CFLAGS) -o $@ $^

%.o: %.c
        $(CC) $(CFLAGS) -c -o $@ $<

Phony targets

PROG := main
SRC = main.c square.c exponent.c
OBJ = $(SRC:%.c=%.o)

.PHONY: all clean

all: $(PROG)

$(PROG): $(OBJ)
        $(CC) $(CFLAGS) -o $@ $^

%.o: %.c
        $(CC) $(CFLAGS) -c $<

clean:
        $(RM) $(PROG) $(OBJ)

Targets are files.

What if want a special target that doesn't create a file?

"all" is a convention. First target is always the default.

"clean" target is another convention for removing generated files.

When we get to version control, convention is to only push non-generated files. Ship build automation script instead of binaries.

A complex example

Diagram

More than incremental builds

Run tests
Generate documentation
Anything you can script in bash

What's a bug?

What do you think?

What a bug is not (usually)

The program does something wrong

What a bug is

A program that does what it's supposed to do, but not what we think it does

A bug is the gap between what the programmer thinks it does and what it actually does

Debugging schema

Narrow down the problem by crafting a smaller version of the test case
Go to the part of the your code that is related to the problem
Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)

Don't start hacking code! First understand the problem.

Once you see the discrepancy between actual code behavior and what you want the code to do, the fix will likely be readily apparent (at least in this class's projects).

Goal: make the specification match the implementation

Specification: what we want our program to do
Implementation: what our program really does
Easier said than done!
During implementation, the incomplete implementation never matches the specification!

(Diagram)

Specification -> Programmer -> Implementation
- (Green highlight entire spec, green highlight part of impl, but mostly red)
- If I try to tackle the whole problem at once, my program is always wrong until I've finished coding the entire specification

Hard way: write whole program, check once done

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2.

Sounds easy. Seems easy at first. Quickly snowballs.

The more you add to your code the more complicated combinations of behavior. It's exponential!

Think about debugging code that has 3 if-then-else statements vs 10.

Lazy way: start with a narrower specification

Easier to get the implementation right
Gradually expand the specification
Keep the implementation correct at each step

I will write a "hello, world!" program as the first step.

top-down vs bottom-up

Constructively lazy (one of the virtues!)

Divide-and-conquer: break the problem down into smaller parts

Can't keep all code in mind for all time
Make it easier for yourself
Delay gratification
- Finishing code fast feels good
- Debugging shoddy code feels horrible

(Diagram)

Specification -> Programmer -> Implementation
- (Divide and green highlight part of spec, divide and green highlight part of impl; then take one more piece of spec, green highlight corresponding impl, with a small part being red)
- But if I divide up the specification, I can focus on getting a simpler, smaller (sub)program done, making debugging easier and it more likely to be correct once I move onto to another part of the specification.
Debugging is made simpler, since there are fewer (likely) things that could be wrong

Use functions and separate compilation to (loosely) organize code into divisions of the problem.

Spend time now, save time debugging later

Breaking problems down takes time, experience, and making mistakes
Premature generalization can waste work
Don't be afraid to refactor
- Easier to reorganize code after prototyping than to write from scratch (for large programs)

Instead of using cognitive energy to keep the whole program in your head and debug the whole program each time there is an issue, use cognitive energy to break the problem down into simpler parts and reason about how to combine them correctly

Within each phase, try to break the problem down further yourself, and make each piece work on its own, then work with other pieces gradually.

Biggest take aways:

Real definition of what a bug is
Revisit your code
- Refactor to match specifications (usuallyeasier to refactor than starting from scratch)
- Take time to make good interfaces
- Less stressful if breaking down the problem first

Abstractions help you organize code

Use abstractions

Hides unnecessary details, exposes clear, simple interfaces
Physical examples: cars, locks, sockets
Interface decisions determined by use of that abstraction

Would you rather plug your laptop into the wall or twist together positive, negative, and ground every time you need power?

The function abstraction

One of the most common and powerful
Names a snippet of code
Provides a definition of input and output
- Developer provides documentation of behavior and usage
You'll get to implement support for functions in your compiler

Combine abstractions to build large software

Abstractions create software components
Components can be tested on their own
Developer can forget about details of the component
- As long as the component is thoroughly tested and documented
Large systems are composed of many layers of abstraction

Bug wrap-up

Bugs are (usually) not wrong programs, but incorrect assumptions
Break the problem down into simpler specifications
Use abstractions to build larger programs

Wirth’s Stepwise Refinement

Program Development by Stepwise Refinement
One methodology for breaking down a problem into abstractions

Building, Testing, and Debugging Programming Environment COP-3402

Table of Contents

Building software

Downsides of building all at once

Separate compilation

Diagram

Example

Compiling our example

Incremental builds

Build-system-related vulnerabilities

SolarWinds

libxz

Build automation

Makefiles

Basic makefile

Basic with clean

Dependencies

Incremental build

Diagram

Wildcard patterns

Variables

Phony targets

A complex example

More than incremental builds

What's a bug?

What do you think?

What a bug is not (usually)

What a bug is

Debugging schema

Goal: make the specification match the implementation

Hard way: write whole program, check once done

Lazy way: start with a narrower specification

Divide-and-conquer: break the problem down into smaller parts

Spend time now, save time debugging later

Abstractions help you organize code

Use abstractions

The function abstraction

Combine abstractions to build large software

Bug wrap-up

Wirth’s Stepwise Refinement

Building, Testing, and Debugging
Programming Environment
COP-3402