Source Code to Running Program
Compiler Foundations
COP-3402
What's a program?
In the systems/architecture context, a program is a sequence of machine instructions.
Source code is a program written in a programming language that may not correspond to the machine's instructions one-to-one.
What's a process?
A running program
How do we go from source code to a process running that program?
Programming language implementation
Can your processor run C code?
gcc -o hello hello.c ./hello
Why are there two commands?
gcc
takes source code and converts it to a runnable program.
Keep in mind that gcc
itself is also a program and bash creates a process to exec the gcc
program
Two main implementation techniques
Compilers and Interpreters
Compiler or interpreter, either way, we have another program that takes our C code and makes it run on our machine.
Two general techniques, though these are actually blended for real-world compiled or interpreted languages.
Roughly two ways to design your programming language implementation:
- Compile or translate to the instructions that the machine does run, so that the machine can run an equivalent of the program
- Interpret or simulate the actions of the instructions to run the program and produce the output
Compiler
The compiler translates your source instructions to machine code.
Diagram
Compiler
The machine code should have the same outputs that the source program would.
Functional equivalence is possible when programs are deterministic. How can programs, such as random number generators, create apparent non-determinism with a deterministic program? They can by using random (or hard to guess) inputs.
Probabilistic and quantum programming languages have probability distributions as outputs.
Interpreter
The interpreter simulates the actions of the source instructions.
Diagram
Emulators are a special kind of interpreter interpreters of machine code, which can be used to run code written for another processor.
Virtual machines are also interpreters of machine code, though sometimes virtualization is useful even for the same processor architecture.
Interpreter
The interpreted program should have the same outputs that the source program would.
Illustrating language implementation
Getting "1+2" to run on our machine.
We assume a syntax and semantics for arithmetic operations that is similar to what C-like languages have.
Compiler approach
Translate "1+2" to machine code.
// read the source code char left = getchar(); char operator = getchar(); char right = getchar(); // generate assembly code that performs the addition if ('+' == operator) { printf(" mov $%c, %%rax\n", left); printf(" mov $%c, %%rbx\n", right); printf(" add %%rbx, %%rax\n"); // print the addition instruction }
Interpreter approach
Interpret the result of "1+2".
// read the source code char left = getchar(); char operator = getchar(); char right = getchar(); // compute the result of the addition if ('+' == operator) { int leftnum = left - '0'; int rightnum = right - '0'; int result = leftnum + rightnum; // perform addition printf("%d\n", result); }
Observe that it's pretty straightforward to interpet addition, because we have the same representation for it in C.
Key difference
- Compiler program: prints out the machine code for addition
- Interpreter program: perform the addition operation
Run and show the output of language.c.
Interpreting exponents, e.g., "2^8"?
No machine instruction or C instruction for exponentiation.
Interpreter pseudo-code
read left operand read right operand read operator if operator == "^": result = 1 for i = 1 to right operand: result = result * left operand print result
Full example for + and ^
language.c
/** gcc -o language language.c echo "1+2" | ./language echo "2^8" | ./language */ #include <stdio.h> #include <stdlib.h> #define CHAR2INT(c) ((c) - '0') int main() { char left = getchar(); char operator = getchar(); char right = getchar(); printf("input: %c%c%c\n", left, operator, right); printf("compiler\n"); switch (operator) { case '+': printf(" mov $%c, %%rax\n", left); printf(" mov $%c, %%rbx\n", right); printf(" add %%rbx, %%rax\n"); break; case '^': printf(" mov $%c, %%rbx\n", left); printf(" mov $%c, %%rcx\n", right); printf(" mov $1, %%rax\n"); printf("loop:\n"); printf(" cmp $0, %%rcx\n"); printf(" jle end\n"); printf(" imul %%rbx, %%rax\n"); printf(" sub $1, %%rcx\n"); printf(" jmp loop\n"); printf("end:\n"); break; } printf("\n"); printf("interpreter\n"); int leftnum = CHAR2INT(left); int rightnum = CHAR2INT(right); int result; switch (operator) { case '+': result = leftnum + rightnum; printf("%d\n", result); break; case '^': result = 1; for (; rightnum > 0; rightnum--) { result *= leftnum; } printf("%d\n", result); break; } }
COP-5621 Compiler Construction
If you want to learn more about how programming languages are defined and implemented.
Source to Running Program
How your C program gets run
The five steps:
cat > hello.c << EOT #include <stdio.h> int main() { printf("hello, world!\n"); } EOT gcc -E hello.c -o hello.i gcc -S hello.i -o hello.s as hello.s -o hello.o ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/libc.so hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello ./hello
The five steps, with commands to inspect the outputs:
cat hello.c gcc -E hello.c -o hello.i cat hello.i gcc -S hello.i -o hello.s cat hello.s as hello.s -o hello.o objdump -d hello.o objdump -t hello.o ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/libc.so hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello objdump -d hello objdump -t hello
The five steps, but with static linking:
cat > hello.c << EOT #include <stdio.h> int main() { printf("hello, world!\n"); } EOT gcc -E hello.c -o hello.i gcc -S hello.i -o hello.s as hello.s -o hello.o # gcc -v -static -o hello hello.o ld -static -o hello /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/13/crtbeginT.o -L/usr/lib/gcc/x86_64-linux-gnu/13 -L/usr/lib/x86_64-linux-gnu -L/usr/lib -L/lib/x86_64-linux-gnu -L/lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/13/../../.. hello.o --start-group -lgcc -lgcc_eh /usr/lib/x86_64-linux-gnu/libc.a --end-group /usr/lib/gcc/x86_64-linux-gnu/13/crtend.o /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crtn.o ./hello