(4) Compiler-Compilers, SimpleC
COP-3402, Spring 2024
Table of Contents
Compiler architecture review
Dragon book
Toy compiler
- Lexer
- Parser
- Code generator
- (Diagram)
lexer.c, ast.c, parser.c, codegen.c -> compiler.exe -> compiler.exe
main thing to understand: our compiler writes out assembly code, it doesn't run the program
Compiler-compilers: automating compiler generation
- Specify tokens and grammar
- Compiler-compiler automatically generates lexer and parser
- (Diagram)
(lexer.l -> lexer.c), ast.c, (parser.y -> parser.c), codegen.c -> compiler.exe -> compiler.exe
- saw details of various parts of the compiler
- saw how we can make the specification more specific, systematic, rigorous
- turns out that we can use this specification to generate those parts of the compiler
- lexer.l -> lexer.c, parser.y -> parser.c; lexer.c, parser.c, codegen.c -> compiler.exe
Lexing
- Toy compiler's lexer.c
- looped over each character
- matched characters to tokens
- created token data structure
The code handle file I/O, reads one-character-at-a-time, groups characters into tokens (number, operator, identifier, etc)
Automatically generating lexers
- Instead of writing by hand
- Specify pattern for each token
- Generate C code from specification automatically
- https://github.com/westes/flex
have you ever seen regular expressions or wildcards?
The flex
tool
- Specify pattern for each token
- Provide code snippet for action to take on pattern
Toy compiler's lexer.l
flex specification in detail
Parsing
- Toy compiler's parser.c
- called lex() to get tokens one-at-a-time
- matched syntax tree recursively
- created AST nodes for each language construct
Automatically generating parsers
- Instead of writing by hand
- Specify language grammar
- Generate C code automatically
- https://www.gnu.org/software/bison/manual/html_node/
labs introduced grammars
The bison
tool
- Specify grammar of each construct
- Provide code snippet for action to take on each construct
Toy compiler's parser.y
bison specification in detail
Interface between flex and bison
Revisiting Makefiles
- https://www.gnu.org/software/make/manual/make.html#Introduction
- How Makefiles work
- Find dependency graph
- Execute each target in order
- Separate compilation
- Compile each software component separately
- Linker matches function calls with definitions
objdump -t file.o
Example: Makefile from toy compiler 2
Generate lexer and parser first
- Build dependency graph
- (Diagram)
See the dependency tree defined by the Makefile See how the files are generated in order to ensure dependencies are met
Toy compiler with flex and bison
- Update the parser and lexer
Toy compiler with a generated lexer and parser
Source code
https://github.com/cop3402/toy2
Compare and constrast the lexer, parser, and code generator
Lexer and parser are generated from specification files (no need to program them in C)
Code generator works the same way, following the parse tree (which is identically created in both versions of the compiler)
(Break)
SimpleC
- Example program
- Informal description
- Formalizing the description
Example
int x; f(a, b) : function(pointer<int>, char) -> char { return *a + (int) b; } main { int x; pointer<int> y; char c; y = &x; c = 't'; return 'c'; }
Language constructs
- Expressions
- Statements
- Declarations
- Function definitions
Going to assume you know C for this. The project will help learn more about those constructs when we implement them.
Expressions
- A subset of C
- Arithmetic operators: + - * /
- Pointer reference and deference: & * []
- Recall that arrays are just pointers in C
- Relational operators: < > <= >=
=!
- Boolean operators: && ||
Statements
- A subset of C
- if
- while
- assignment
Assignment is actually an expression in C, e.g., a = b = c;
Declarations
- Subset of C's types
- Different syntax for types
- Two kinds of types
- Primitive types: int, char
- Compound types: pointer, array, function
IMHO SimpleC makes type declarations more explicit, less esoteric
Declaration examples
int x; pointer<int> y; array<5, int> z; pointer<pointer<char>> pointer_to_string;
Function definitions
- New syntax
- Makes function type a little more explicit
Function types
(param_type1, param_type2) -> return_type
- Must have return statement at the end
- No void functions
Function definition examples
square(a) : function(int) -> int { return a * a; }
Function definition examples
multiply(a, b) : function(int, int) -> int { return a * b; }
Function definition examples
increment_array(array, size, amount) : function(pointer<int>, int, int) -> pointer<int> { int i; i = 0; while (i < size) { array[i] = array[i] + amount; i = i - 1; } return array; }
main function
- SimpleC has a mandatory main function
- Differs from C, no separate compilation/linking
- Must be at end of source file
- No type or parameter specification
- These will be built-in
main is part of the C runtime library. Syntactically, main is just another function definition, while the runtime ensures that it is the first thing called after initialization. See lecture 02 on how source becomes an executable.
Example revisited
int x; f(a, b) : function(pointer<int>, char) -> char { return *a + (int) b; } main { int x; pointer<int> y; char c; y = &x; c = 't'; c = f(y, c); return 'c'; }