Type checking III
Lecture 9
Table of Contents
Review
Questions about the last class?
Quiz
How do we a define a type?
Quiz Discussion
Handling declarations
- User may add new symbols to the language
- Type-checking needs to proof type safety of these as well
- Many language require declarations
Some languages can infer types from operators used.
Type-checker records type information about each symbol
- Maps symbol name to type
- Declarations populate the symbol table
- Relevant to scoping rules
- Same name, different memory location
- Disambiguate by scope
Symbol tables
Maps symbol name to its type name
int x; int y; string z; y = 1 + x; y = z;
symbol | type |
---|---|
x | int |
y | int |
z | string |
A symbol table is also used during code generation to track (relative) memory locations of symbols.
How do we handle scoping rules?
Same symbol have different types and be accessible in different sections of the source code text.
Symbol table designs
- Add a field for the scope
Chained symbol tables
int globalvar; def f (arg1 : bool) -> int { int x; int y; string z; y = 1 + x; return y; }
Single symbol table
symbol | type | scope |
---|---|---|
globalvar | int | GLOBAL |
f | (bool) -> int | GLOBAL |
x | int | GLOBAL.f |
y | int | GLOBAL.f |
z | string | GLOBAL.f |
Note that f is in the global scope, but it's parameters are in f's scope. Constructing these scopes requires care to make sure that f is added to the global scope before entering f's scope.
Chained symbol tables
GLOBAL (parent_scope=none)
symbol | type |
---|---|
globalvar | int |
f | (bool) -> int |
f (parent_scope=GLOBAL)
symbol | type |
---|---|
x | int |
y | int |
z | string |
The project's skeleton provides a symbol table implementation using this chained technique.
Specifying type checkers
Typing judgments
- Based on "proof rules"
- Systematic notation for deriving proofs from logical rules
- Popular notation in academia
- Sort of an esoteric description of an evaluator
- Described in Type Systems
2017 ACM PPoPP Keynote: It's Time for a New Old Language by Guy Steele
- A discussion of the history and issues with the formal notation for programming languages.
The Cool Reference Manual by Alex Aiken
- An example of defining type rules (and operational semantics) for an object-oriented language.
Syntax vs. semantics
- I use angle brackets, e.g., <expression>, to denote the semantic value of syntax
- The map is not the territory, ASCII numbers are not machine representations of numbers
- Example
- ASCII: 12+3
- Semantic value: <12+3>
Notation
hypotheses ---------- conclusion
Examples
----------- <1> : int ----------- <2> : int ----------- <+> : (int, int) -> int <1> : int <2> : int + : (int, int) -> int ------------------------------------------------- <1 + 2> : int // using a metavariable (think nonterminal in syntax) to avoid writing rules for each possible element of the language <n1> : int <n2> : int op : (int, int) -> int ---------------------------------------------------- <n1 op n2> : int
Deduction via proof rules
Prove that "1 + 2 * 3" is well-typed
<2> : int <3> : int <*> : (int, int) -> int ------------------------------------------------ <1> : int <2 * 3> : int * : (int, int) -> int ---------------------------------------------------------------------------------------- <1 + 2 * 3> : int
Implied by this notation is an automated process that matches hypotheses with conclusions and handles matching metavariable to inputs (i.e., parsing).
Type rules for SimpleC
// abstract syntax: abstract syntax does not capture all of the concrete syntax's language restrictions. boolean operators use and, or, and not to remove conflict with the pipe symbol. program ::= (d | f)+ // a program f ::= def f(fs) -> t { d* st* } // function definition fs ::= (x : t (, x : t)*)? // formal arguments d ::= x : t; // a declaration t ::= int // integer type | bool // boolean type | string // string type | (t+) -> t // function type st ::= x = e; // assignment statement | while (e) st // while statement | if (e) st else st // if-then-else statement | if (e) st // if-then statement | return expr ; // return statement | e; // expression statement | { st* } // compound statement e ::= f(actuals) // function call expression actuals ::= (e (, e)*)? // function args e ::= e op e | op e | (e) // arithmetic expressions op ::= + | - | * | / // numeric operators op ::= && | "||" | ! // boolean operators op ::= == | != | < | <= | > | >= // relational operators e ::= x // variable usage e ::= n | b | S // literals n ::= [0-9]+ // numeric values b ::= true | false // boolean values s ::= " characters " // string values x ::= [A-Za-z0-9]+ // identifiers // environment (symbol table): this holds the type information for the symbols in scope S the environment is an ordered list of (name, type) pairs S' = [(name, type)] + S means S' is identical to S, but has the new mapping (name, type). unique(S) means S has only unique names t = lookup(name, S) means lookup names' type in S, checking the parent as well if needed parent(S) = S' means S has parent scope S' // notation <e> refers to e's semantic value, e.g., ASCII numbers vs. Java integers. <e> : int says that e's semantic value is the type int. S |- <e> : int says that, given the environment (symbol table) S, e has type int. S |- <st> ~> S' says that given the environment (symbol table) S, st produces a new environment S'. hypotheses ---------- conclusion means that if the hypotheses are true we can conclude that the conclusion is true. // literals: the symbols mean their equivalent mathmetical values for numbers and boolean true/false. -------------- <n> : int -------------- <b> : bool -------------- <s> : string // operators: the symbols for arithmetic, boolean, and relationship operators each have a function type S |- <e1> : int S |- <e2> : int op \in { "+", "-", "*", "/" } --------------------------------------------------------------------- S |- <e1 op e2> : int S |- <e1> : bool S |- <e2> : bool op \in { "&&", "||" } ---------------------------------------------------------------- S |- <e1 op e2> : bool S |- <e1> : bool -------------------- S |- <!e1> : bool S |- <e1> : int -------------------- S |- <-e1> : int // variables: variables assignments evaluate their right-hand side at define-time and are stored and looked up in a storage context. S' = [(x, t)] + S unique(S) ----------------------------- [declaration] S |- <x : t;> : S' t = lookup(x, S) --------------- [substitution] S |- <x> : t S |- <e> : t1 t2 = lookup(x, S) t1 = t2 --------------------------------------------- [assignment] S |- <x = e;> : S // control-flow: conditionals and iteration are statements that update state but produce no value. S |- <e> : bool S |- st1 ~> S' S |- st1 ~> S'' // type-check both branches ---------------------------------------------------- S |- <if e st1 else st2> ~> S // nested scope doesn't affect parent scope S |- <e> : bool S |- <st> ~> S' ---------------------------------- S |- <while e st> ~> S // functions: functions are call-by-value, have a local storage context, and produce a return value S' = [(f, fs -> t)] + S // add function to the current scope parent(S'') = S' // define a new scope with the old scope as parent note that the // parent scope has the function definition in it so recursion is acceptable S'' = [(fs[i].name, fs[i].type)] for all fs[i] in fs // add parameter types to symbol table S'' |- d* ~> S''' // add declarations to the symbol table S''' |- st* ~> S''' // evaluate statements under the updated symbol table. note that // statements do not update the symbol table, so it is S''' out as well ---------------------------------- [definition] S |- <def f(fs) -> t { d* st* }> ~> S' (t1, ..., tN) -> t = lookup(f, S) S |- actuals[i] : ti for all i = 1 to N --------------------------------------------- [call] S |- <f(actuals)> : t // lookup there exists (name, t) in S ------------ S |- lookup(name, S) ~> t S' = parent(S) there exists (name, t) in S' ------------ S |- lookup(name, S) ~> t