UP | HOME

Building, Testing, and Debugging
Programming Environment
COP-3402

Table of Contents

Building software

How do you build a C program?

gcc -o hello hello.c

What if I have two C files?

gcc -o main main.c hello.c

What if I have 30,000+ C files?

gcc -o vmlinux kernel/locking/mutex-debug.c \
  kernel/locking/rwsem.c kernel/locking/rtmutex.c \
  kernel/locking/qrwlock.c kernel/locking/irqflag-debug.c \
  kernel/locking/test-ww_mutex.c kernel/locking/mutex.c \
  # 30,000+ more

Why not build the whole program at once?

git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
find kernel/ arch/ block/ crypto/ drivers/ fs/ init/ ipc/ lib/ mm net/ security/ sound/ virt/ -name "*.c" | wc -l

Downsides of building all at once

  • Long compilation time
  • Small changes require complete recompile
  • Intentionally organize your code into separate parts
  • Learn this in software engineering
  • Projects won't be that large in this class
  • I'll give you the organization for the compiler project

Separate compilation

  • Divide program into separate C files
  • Compile individually
  • Link compiled files

This is how C/C++ do separate compilation, but other languages, like Java, further incorporate language abstractions with source code organization, such as Java's use of one-class per file.

Diagram

main.c     -> gcc -> main.o
square.c   -> gcc -> square.o
exponent.c -> gcc -> exponent.o
                        v
                       ld
                        v
                       main

Example

main.c

#include <stdio.h>

int square(int);

int main() {
  printf("%d\n", square(3));
}

square.c

int exponent(int x, int y);

int square(int x) {
  return exponent(x, 2);
}

exponent.c

int exponent(int x, int y) {
  int result = 1;
  for (; y > 0; result *= x, y--);
  return result;
}

How do we separately compile this program?

gcc -c main.c

-c compiles the .c file to an object file .o

Object files are translations of C to machine code.

Without -c, gcc will also link the C runtime and standard libraries, which is why trying to compile without main will throw a linker error.

We will cover how source code becomes a program and and running process later in the semester.

Compiling our example

Compile each .c file (without linking) to a .o object file

gcc -c main.c  # -c flag produces .o files
gcc -c square.c
gcc -c exponent.c

Then link all object files into a single program binary

gcc -o main main.o square.o exponent.o  # pass .o files

Notice gcc takes the .o files instead of

gcc looks at file extensions to distinguish source code from object files.

gcc will run linking for you, but you can also run ld the linker yourself. We will cover this more later in the semester.

Incremental builds

Only rebuild source files that changed.

emacs main.c   # modify one source file
gcc -c main.c  # recompile it

Re-link with existing object files

gcc -o main main.o square.o exponent.o

Build-system-related vulnerabilities

SolarWinds

libxz

Build automation

Makefiles

target … : prerequisites …
        recipe
        …
        …

Basic makefile

main:
        gcc -o main main.c square.c exponent.c

Won't rebuild if C file is changed.

Basic with clean

main:
        gcc -o main main.c square.c exponent.c

clean:
        rm -f main

Dependencies

main: main.c square.c exponent.c
        gcc -o main main.c square.c exponent.c

Will rebuild if C file is changed, but everything is recompiled.

Incremental build

main: main.o square.o exponent.o
        gcc -o main main.o square.o exponent.o

main.o: main.c
        gcc -c main.c

square.o: square.c
        gcc -c square.c

exponent.o: exponent.c
        gcc -c exponent.c

Only rebuilds the C file that changed!

make
touch main.c
make

Diagram

Wildcard patterns

main: main.o square.o exponent.o
        gcc -o main main.o square.o exponent.o

%.o: %.c
        gcc -c $<

Variables

PROG := main
SRC = main.c square.c exponent.c
OBJ = $(SRC:%.c=%.o)

$(PROG): $(OBJ)
        $(CC) $(CFLAGS) -o $@ $^

%.o: %.c
        $(CC) $(CFLAGS) -c -o $@ $<

Phony targets

PROG := main
SRC = main.c square.c exponent.c
OBJ = $(SRC:%.c=%.o)

.PHONY: all clean

all: $(PROG)

$(PROG): $(OBJ)
        $(CC) $(CFLAGS) -o $@ $^

%.o: %.c
        $(CC) $(CFLAGS) -c $<

clean:
        $(RM) $(PROG) $(OBJ)

Targets are files.

What if want a special target that doesn't create a file?

"all" is a convention. First target is always the default.

"clean" target is another convention for removing generated files.

When we get to version control, convention is to only push non-generated files. Ship build automation script instead of binaries.

More than incremental builds

  • Run tests
  • Generate documentation
  • Anything you can script in bash

What's a bug?

What do you think?

What a bug is not (usually)

  • The program does something wrong

What a bug is

  • A program that does what it's supposed to do, but not what we think it does

A bug is the gap between what the programmer thinks it does and what it actually does

Debugging schema

  1. Narrow down the problem by crafting a smaller version of the test case
  2. Go to the part of the your code that is related to the problem
  3. Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)

Don't start hacking code! First understand the problem.

Once you see the discrepancy between actual code behavior and what you want the code to do, the fix will likely be readily apparent (at least in this class's projects).

Goal: make the specification match the implementation

  • Specification: what we want our program to do
  • Implementation: what our program really does
  • Easier said than done!
  • During implementation, the incomplete implementation never matches the specification!

(Diagram)

  • Specification -> Programmer -> Implementation
    • (Green highlight entire spec, green highlight part of impl, but mostly red)
    • If I try to tackle the whole problem at once, my program is always wrong until I've finished coding the entire specification

Hard way: write whole program, check once done

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2.

Sounds easy. Seems easy at first. Quickly snowballs.

The more you add to your code the more complicated combinations of behavior. It's exponential!

Think about debugging code that has 3 if-then-else statements vs 10.

Lazy way: start with a narrower specification

  • Easier to get the implementation right
  • Gradually expand the specification
  • Keep the implementation correct at each step

I will write a "hello, world!" program as the first step.

top-down vs bottom-up

Constructively lazy (one of the virtues!)

Divide-and-conquer: break the problem down into smaller parts

  • Can't keep all code in mind for all time
  • Make it easier for yourself
  • Delay gratification
    • Finishing code fast feels good
    • Debugging shoddy code feels horrible

(Diagram)

  • Specification -> Programmer -> Implementation
    • (Divide and green highlight part of spec, divide and green highlight part of impl; then take one more piece of spec, green highlight corresponding impl, with a small part being red)
    • But if I divide up the specification, I can focus on getting a simpler, smaller (sub)program done, making debugging easier and it more likely to be correct once I move onto to another part of the specification.
  • Debugging is made simpler, since there are fewer (likely) things that could be wrong

Use functions and separate compilation to (loosely) organize code into divisions of the problem.

Spend time now, save time debugging later

  • Breaking problems down takes time, experience, and making mistakes
  • Premature generalization can waste work
  • Don't be afraid to refactor
    • Easier to reorganize code after prototyping than to write from scratch (for large programs)

Instead of using cognitive energy to keep the whole program in your head and debug the whole program each time there is an issue, use cognitive energy to break the problem down into simpler parts and reason about how to combine them correctly

Within each phase, try to break the problem down further yourself, and make each piece work on its own, then work with other pieces gradually.

Biggest take aways:

  • Real definition of what a bug is
  • Revisit your code
    • Refactor to match specifications (usuallyeasier to refactor than starting from scratch)
    • Take time to make good interfaces
    • Less stressful if breaking down the problem first

Abstractions help you organize code

Use abstractions

  • Hides unnecessary details, exposes clear, simple interfaces
  • Physical examples: cars, locks, sockets
  • Interface decisions determined by use of that abstraction

Would you rather plug your laptop into the wall or twist together positive, negative, and ground every time you need power?

The function abstraction

  • One of the most common and powerful
  • Names a snippet of code
  • Provides a definition of input and output
    • Developer provides documentation of behavior and usage
  • You'll get to implement support for functions in your compiler

Combine abstractions to build large software

  • Abstractions create software components
  • Components can be tested on their own
  • Developer can forget about details of the component
    • As long as the component is thoroughly tested and documented
  • Large systems are composed of many layers of abstraction

Bug wrap-up

  • Bugs are (usually) not wrong programs, but incorrect assumptions
  • Break the problem down into simpler specifications
  • Use abstractions to build larger programs

Wirth’s Stepwise Refinement

Author: Paul Gazzillo

Created: 2024-10-10 Thu 11:34

Validate