Building, Testing, and Debugging
Programming Environment
COP-3402
Table of Contents
Building software
How do you build a C program?
gcc -o hello hello.c
What if I have two C files?
gcc -o main main.c hello.c
What if I have 30,000+ C files?
gcc -o vmlinux kernel/locking/mutex-debug.c \ kernel/locking/rwsem.c kernel/locking/rtmutex.c \ kernel/locking/qrwlock.c kernel/locking/irqflag-debug.c \ kernel/locking/test-ww_mutex.c kernel/locking/mutex.c \ # 30,000+ more
Why not build the whole program at once?
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git find kernel/ arch/ block/ crypto/ drivers/ fs/ init/ ipc/ lib/ mm net/ security/ sound/ virt/ -name "*.c" | wc -l
Downsides of building all at once
- Long compilation time
- Small changes require complete recompile
- Intentionally organize your code into separate parts
- Learn this in software engineering
- Projects won't be that large in this class
- I'll give you the organization for the compiler project
Separate compilation
- Divide program into separate C files
- Compile individually
- Link compiled files
This is how C/C++ do separate compilation, but other languages, like Java, further incorporate language abstractions with source code organization, such as Java's use of one-class per file.
Diagram
main.c -> gcc -> main.o square.c -> gcc -> square.o exponent.c -> gcc -> exponent.o v ld v main
Example
main.c
#include <stdio.h> int square(int); int main() { printf("%d\n", square(3)); }
square.c
int exponent(int x, int y); int square(int x) { return exponent(x, 2); }
exponent.c
int exponent(int x, int y) { int result = 1; for (; y > 0; result *= x, y--); return result; }
How do we separately compile this program?
gcc -c main.c
-c
compiles the .c
file to an object file .o
Object files are translations of C to machine code.
Without -c
, gcc will also link the C runtime and standard libraries, which is why trying to compile without main will throw a linker error.
We will cover how source code becomes a program and and running process later in the semester.
Compiling our example
Compile each .c file (without linking) to a .o object file
gcc -c main.c # -c flag produces .o files gcc -c square.c gcc -c exponent.c
Then link all object files into a single program binary
gcc -o main main.o square.o exponent.o # pass .o files
Notice gcc takes the .o files instead of
gcc looks at file extensions to distinguish source code from object files.
gcc will run linking for you, but you can also run ld
the linker yourself. We will cover this more later in the semester.
Incremental builds
Only rebuild source files that changed.
emacs main.c # modify one source file gcc -c main.c # recompile it
Re-link with existing object files
gcc -o main main.o square.o exponent.o
Build-system-related vulnerabilities
SolarWinds
libxz
Build automation
Makefiles
target … : prerequisites … recipe … …
Basic makefile
main: gcc -o main main.c square.c exponent.c
Won't rebuild if C file is changed.
Basic with clean
main: gcc -o main main.c square.c exponent.c clean: rm -f main
Dependencies
main: main.c square.c exponent.c gcc -o main main.c square.c exponent.c
Will rebuild if C file is changed, but everything is recompiled.
Incremental build
main: main.o square.o exponent.o gcc -o main main.o square.o exponent.o main.o: main.c gcc -c main.c square.o: square.c gcc -c square.c exponent.o: exponent.c gcc -c exponent.c
Only rebuilds the C file that changed!
make touch main.c make
Diagram
Wildcard patterns
main: main.o square.o exponent.o gcc -o main main.o square.o exponent.o %.o: %.c gcc -c $<
Variables
PROG := main SRC = main.c square.c exponent.c OBJ = $(SRC:%.c=%.o) $(PROG): $(OBJ) $(CC) $(CFLAGS) -o $@ $^ %.o: %.c $(CC) $(CFLAGS) -c -o $@ $<
Phony targets
PROG := main SRC = main.c square.c exponent.c OBJ = $(SRC:%.c=%.o) .PHONY: all clean all: $(PROG) $(PROG): $(OBJ) $(CC) $(CFLAGS) -o $@ $^ %.o: %.c $(CC) $(CFLAGS) -c $< clean: $(RM) $(PROG) $(OBJ)
Targets are files.
What if want a special target that doesn't create a file?
"all" is a convention. First target is always the default.
"clean" target is another convention for removing generated files.
When we get to version control, convention is to only push non-generated files. Ship build automation script instead of binaries.
A complex example
Diagram
More than incremental builds
- Run tests
- Generate documentation
- Anything you can script in bash
What's a bug?
What do you think?
What a bug is not (usually)
- The program does something wrong
What a bug is
- A program that does what it's supposed to do, but not what we think it does
A bug is the gap between what the programmer thinks it does and what it actually does
Debugging schema
- Narrow down the problem by crafting a smaller version of the test case
- Go to the part of the your code that is related to the problem
- Trace step-by-step what your code really does (not what you think/hope/feel/guess/divine/intuit/reckon it does)
Don't start hacking code! First understand the problem.
Once you see the discrepancy between actual code behavior and what you want the code to do, the fix will likely be readily apparent (at least in this class's projects).
Goal: make the specification match the implementation
- Specification: what we want our program to do
- Implementation: what our program really does
- Easier said than done!
- During implementation, the incomplete implementation
never
matches the specification!
(Diagram)
- Specification -> Programmer -> Implementation
- (Green highlight entire spec, green highlight part of impl, but mostly red)
- If I try to tackle the whole problem at once, my program is always wrong until I've finished coding the entire specification
Hard way: write whole program, check once done
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan, "The Elements of Programming Style", 2nd edition, chapter 2.
Sounds easy. Seems easy at first. Quickly snowballs.
The more you add to your code the more complicated combinations of behavior. It's exponential!
Think about debugging code that has 3 if-then-else statements vs 10.
Lazy way: start with a narrower specification
- Easier to get the implementation right
- Gradually expand the specification
- Keep the implementation correct at each step
I will write a "hello, world!" program as the first step.
top-down vs bottom-up
Constructively lazy (one of the virtues!)
Divide-and-conquer: break the problem down into smaller parts
- Can't keep all code in mind for all time
- Make it easier for yourself
- Delay gratification
- Finishing code fast feels good
- Debugging shoddy code feels horrible
(Diagram)
- Specification -> Programmer -> Implementation
- (Divide and green highlight part of spec, divide and green highlight part of impl; then take one more piece of spec, green highlight corresponding impl, with a small part being red)
- But if I divide up the specification, I can focus on getting a simpler, smaller (sub)program done, making debugging easier and it more likely to be correct once I move onto to another part of the specification.
- Debugging is made simpler, since there are fewer (likely) things that could be wrong
Use functions and separate compilation to (loosely) organize code into divisions of the problem.
Spend time now, save time debugging later
- Breaking problems down takes time, experience, and making mistakes
- Premature generalization can waste work
- Don't be afraid to refactor
- Easier to reorganize code after prototyping than to write from scratch (for large programs)
Instead of using cognitive energy to keep the whole program in your head and debug the whole program each time there is an issue, use cognitive energy to break the problem down into simpler parts and reason about how to combine them correctly
Within each phase, try to break the problem down further yourself, and make each piece work on its own, then work with other pieces gradually.
Biggest take aways:
- Real definition of what a bug is
- Revisit your code
- Refactor to match specifications (usuallyeasier to refactor than starting from scratch)
- Take time to make good interfaces
- Less stressful if breaking down the problem first
Abstractions help you organize code
Use abstractions
- Hides unnecessary details, exposes clear, simple interfaces
- Physical examples: cars, locks, sockets
- Interface decisions determined by use of that abstraction
Would you rather plug your laptop into the wall or twist together positive, negative, and ground every time you need power?
The function abstraction
- One of the most common and powerful
- Names a snippet of code
- Provides a definition of input and output
- Developer provides documentation of behavior and usage
- You'll get to implement support for functions in your compiler
Combine abstractions to build large software
- Abstractions create software components
- Components can be tested on their own
- Developer can forget about details of the component
- As long as the component is thoroughly tested and documented
- Large systems are composed of many layers of abstraction
Bug wrap-up
- Bugs are (usually) not wrong programs, but incorrect assumptions
- Break the problem down into simpler specifications
- Use abstractions to build larger programs
Wirth’s Stepwise Refinement
- Program Development by Stepwise Refinement
- One methodology for breaking down a problem into abstractions