UP | HOME

Source Code to Running Program
Compiler Foundations
COP-3402

What's a program?

In the systems/architecture context, a program is a sequence of machine instructions.

Source code is a program written in a programming language that may not correspond to the machine's instructions one-to-one.

What's a process?

A running program

How do we go from source code to a process running that program?

Programming language implementation

Can your processor run C code?

gcc -o hello hello.c
./hello

Why are there two commands?

gcc takes source code and converts it to a runnable program.

Keep in mind that gcc itself is also a program and bash creates a process to exec the gcc program

Two main implementation techniques

Compilers and Interpreters

Compiler or interpreter, either way, we have another program that takes our C code and makes it run on our machine.

Two general techniques, though these are actually blended for real-world compiled or interpreted languages.

Roughly two ways to design your programming language implementation:

  • Compile or translate to the instructions that the machine does run, so that the machine can run an equivalent of the program
  • Interpret or simulate the actions of the instructions to run the program and produce the output

Compiler

The compiler translates your source instructions to machine code.

Diagram

Compiler

The machine code should have the same outputs that the source program would.

Functional equivalence is possible when programs are deterministic. How can programs, such as random number generators, create apparent non-determinism with a deterministic program? They can by using random (or hard to guess) inputs.

Probabilistic and quantum programming languages have probability distributions as outputs.

Interpreter

The interpreter simulates the actions of the source instructions.

Diagram

Emulators are a special kind of interpreter interpreters of machine code, which can be used to run code written for another processor.

Virtual machines are also interpreters of machine code, though sometimes virtualization is useful even for the same processor architecture.

Interpreter

The interpreted program should have the same outputs that the source program would.

Illustrating language implementation

Getting "1+2" to run on our machine.

We assume a syntax and semantics for arithmetic operations that is similar to what C-like languages have.

Compiler approach

Translate "1+2" to machine code.

// read the source code
char left = getchar();
char operator = getchar();
char right = getchar();

// generate assembly code that performs the addition
if ('+' == operator) {
  printf("  mov $%c, %%rax\n", left);
  printf("  mov $%c, %%rbx\n", right);
  printf("  add %%rbx, %%rax\n");  // print the addition instruction
}

Interpreter approach

Interpret the result of "1+2".

// read the source code
char left = getchar();
char operator = getchar();
char right = getchar();

// compute the result of the addition
if ('+' == operator) {
  int leftnum = left - '0';
  int rightnum = right - '0';
  int result = leftnum + rightnum;  // perform addition
  printf("%d\n", result);
}

Observe that it's pretty straightforward to interpet addition, because we have the same representation for it in C.

Key difference

  • Compiler program: prints out the machine code for addition
  • Interpreter program: perform the addition operation

Run and show the output of language.c.

Interpreting exponents, e.g., "2^8"?

No machine instruction or C instruction for exponentiation.

Interpreter pseudo-code

read left operand
read right operand
read operator

if operator == "^":
  result = 1
  for i = 1 to right operand:
    result = result * left operand
  print result

Full example for + and ^

language.c

/**
  gcc -o language language.c
  echo "1+2" | ./language
  echo "2^8" | ./language
*/

#include <stdio.h>
#include <stdlib.h>

#define CHAR2INT(c) ((c) - '0')

int main() {
  char left = getchar();
  char operator = getchar();
  char right = getchar();

  printf("input: %c%c%c\n", left, operator, right);

  printf("compiler\n");
  switch (operator) {
  case '+':
    printf("  mov $%c, %%rax\n", left);
    printf("  mov $%c, %%rbx\n", right);
    printf("  add %%rbx, %%rax\n");
    break;
  case '^':
    printf("  mov $%c, %%rbx\n", left);
    printf("  mov $%c, %%rcx\n", right);
    printf("  mov $1, %%rax\n");
    printf("loop:\n");
    printf("  cmp $0, %%rcx\n");
    printf("  jle end\n");
    printf("  imul %%rbx, %%rax\n");
    printf("  sub $1, %%rcx\n");
    printf("  jmp loop\n");
    printf("end:\n");
    break;
  }
  printf("\n");

  printf("interpreter\n");
  int leftnum = CHAR2INT(left);
  int rightnum = CHAR2INT(right);
  int result;
  switch (operator) {
  case '+':
    result = leftnum + rightnum;
    printf("%d\n", result);
    break;
  case '^':
    result = 1;
    for (; rightnum > 0; rightnum--) {
      result *= leftnum;
    }
    printf("%d\n", result);
    break;
  }
}

COP-5621 Compiler Construction

If you want to learn more about how programming languages are defined and implemented.

Source to Running Program

How your C program gets run

source_to_process.svg

The five steps:

cat > hello.c << EOT
#include <stdio.h>
int main() { printf("hello, world!\n"); }
EOT
gcc -E hello.c -o hello.i
gcc -S hello.i -o hello.s
as hello.s -o hello.o
ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/libc.so hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello
./hello

The five steps, with commands to inspect the outputs:

cat hello.c
gcc -E hello.c -o hello.i
cat hello.i
gcc -S hello.i -o hello.s
cat hello.s
as hello.s -o hello.o
objdump -d hello.o
objdump -t hello.o
ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/libc.so hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o hello
objdump -d hello
objdump -t hello

The five steps, but with static linking:

cat > hello.c << EOT
#include <stdio.h>
int main() { printf("hello, world!\n"); }
EOT
gcc -E hello.c -o hello.i
gcc -S hello.i -o hello.s
as hello.s -o hello.o
# gcc -v -static -o hello hello.o
ld -static -o hello /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/13/crtbeginT.o -L/usr/lib/gcc/x86_64-linux-gnu/13 -L/usr/lib/x86_64-linux-gnu -L/usr/lib -L/lib/x86_64-linux-gnu -L/lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/13/../../.. hello.o --start-group -lgcc -lgcc_eh /usr/lib/x86_64-linux-gnu/libc.a --end-group /usr/lib/gcc/x86_64-linux-gnu/13/crtend.o /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/crtn.o
./hello

Author: Paul Gazzillo

Created: 2024-10-16 Wed 12:46

Validate