Type checking III
Lecture 9

Review
Handling declarations
Specifying type checkers
Type rules for SimpleC

Review

Questions about the last class?

Quiz

How do we a define a type?

Quiz Discussion

Handling declarations

User may add new symbols to the language
Type-checking needs to proof type safety of these as well
Many language require declarations

Growing a Language

Some languages can infer types from operators used.

Type-checker records type information about each symbol

Maps symbol name to type
Declarations populate the symbol table
Relevant to scoping rules
- Same name, different memory location
- Disambiguate by scope

Symbol tables

Maps symbol name to its type name

int x;
int y;
string z;

y = 1 + x;
y = z;

symbol	type
x	int
y	int
z	string

A symbol table is also used during code generation to track (relative) memory locations of symbols.

How do we handle scoping rules?

Same symbol have different types and be accessible in different sections of the source code text.

Symbol table designs

Add a field for the scope

Chained symbol tables

int globalvar;

def f (arg1 : bool) -> int {
  int x;
  int y;
  string z;

  y = 1 + x;
  return y;
}

Single symbol table

symbol	type	scope
globalvar	int	GLOBAL
f	(bool) -> int	GLOBAL
x	int	GLOBAL.f
y	int	GLOBAL.f
z	string	GLOBAL.f

Note that f is in the global scope, but it's parameters are in f's scope. Constructing these scopes requires care to make sure that f is added to the global scope before entering f's scope.

Chained symbol tables

GLOBAL (parent_scope=none)

symbol	type
globalvar	int
f	(bool) -> int

f (parent_scope=GLOBAL)

symbol	type
x	int
y	int
z	string

The project's skeleton provides a symbol table implementation using this chained technique.

Specifying type checkers

Typing judgments

Based on "proof rules"
- Systematic notation for deriving proofs from logical rules
Popular notation in academia
- Sort of an esoteric description of an evaluator
Described in Type Systems

2017 ACM PPoPP Keynote: It's Time for a New Old Language by Guy Steele

A discussion of the history and issues with the formal notation for programming languages.

The Cool Reference Manual by Alex Aiken

An example of defining type rules (and operational semantics) for an object-oriented language.

Syntax vs. semantics

I use angle brackets, e.g., <expression>, to denote the semantic value of syntax
- The map is not the territory, ASCII numbers are not machine representations of numbers
Example
- ASCII: 12+3
- Semantic value: <12+3>

Notation

hypotheses
----------
conclusion

Examples

-----------
<1> : int


-----------
<2> : int


-----------
<+> : (int, int) -> int


<1> : int     <2> : int     + : (int, int) -> int
-------------------------------------------------
<1 + 2> : int


// using a metavariable (think nonterminal in syntax) to avoid writing rules for each possible element of the language
<n1> : int    <n2> : int      op : (int, int) -> int
----------------------------------------------------
<n1 op n2> : int

Deduction via proof rules

Prove that "1 + 2 * 3" is well-typed

                <2> : int   <3> : int   <*> : (int, int) -> int
               ------------------------------------------------
<1> : int      <2 * 3> : int                                       * : (int, int) -> int
----------------------------------------------------------------------------------------
<1 + 2 * 3> : int

Implied by this notation is an automated process that matches hypotheses with conclusions and handles matching metavariable to inputs (i.e., parsing).

Type rules for SimpleC

// abstract syntax: abstract syntax does not capture all of the concrete syntax's language restrictions.  boolean operators use and, or, and not to remove conflict with the pipe symbol.

program ::= (d | f)+                   // a program 

f       ::= def f(fs) -> t { d* st* }  // function definition
fs      ::= (x : t (, x : t)*)?        // formal arguments

d       ::= x : t;                     // a declaration

t       ::= int                        // integer type
          | bool                       // boolean type
          | string                     // string type
          | (t+) -> t                  // function type

st      ::= x = e;                     // assignment statement
          | while (e) st               // while statement
          | if (e) st else st          // if-then-else statement
          | if (e) st                  // if-then statement
          | return expr ;              // return statement
          | e;                         // expression statement
          | { st* }                    // compound statement

e       ::= f(actuals)                 // function call expression
actuals ::= (e (, e)*)?                // function args
e       ::= e op e | op e | (e)        // arithmetic expressions
op      ::= + | - | * | /              // numeric operators
op      ::= && | "||" | !              // boolean operators
op      ::= == | != | < | <= | > | >=  // relational operators
e       ::= x                          // variable usage
e       ::= n | b | S                  // literals

n       ::= [0-9]+                     // numeric values
b       ::= true | false               // boolean values
s       ::= " characters "             // string values
x       ::= [A-Za-z0-9]+               // identifiers


// environment (symbol table): this holds the type information for the symbols in scope

S
the environment is an ordered list of (name, type) pairs

S' = [(name, type)] + S
means S' is identical to S, but has the new mapping (name, type).

unique(S)
means S has only unique names

t = lookup(name, S)
means lookup names' type in S, checking the parent as well if needed

parent(S) = S'
means S has parent scope S'

// notation

<e>
refers to e's semantic value, e.g., ASCII numbers vs. Java integers.

<e> : int
says that e's semantic value is the type int.

S |- <e> : int
says that, given the environment (symbol table) S, e has type int.

S |- <st> ~> S'
says that given the environment (symbol table) S, st produces a new environment S'.


hypotheses
----------
conclusion

means that if the hypotheses are true we can conclude that the conclusion is true.


// literals: the symbols mean their equivalent mathmetical values for numbers and boolean true/false.

--------------
<n> : int

--------------
<b> : bool

--------------
<s> : string


// operators: the symbols for arithmetic, boolean, and relationship operators each have a function type

S |- <e1> : int     S |- <e2> : int     op \in { "+", "-", "*", "/" }
---------------------------------------------------------------------
S |- <e1 op e2> : int

S |- <e1> : bool     S |- <e2> : bool     op \in { "&&", "||" }
----------------------------------------------------------------
S |- <e1 op e2> : bool

S |- <e1> : bool
--------------------
S |- <!e1> : bool

S |- <e1> : int
--------------------
S |- <-e1> : int


// variables: variables assignments evaluate their right-hand side at define-time and are stored and looked up in a storage context.

S' = [(x, t)] + S   unique(S)
-----------------------------  [declaration]
S |- <x : t;> : S'

t = lookup(x, S)
---------------    [substitution]
S |- <x> : t

S |- <e> : t1     t2 = lookup(x, S)    t1 = t2
---------------------------------------------  [assignment]
S |- <x = e;> : S


// control-flow: conditionals and iteration are statements that update state but produce no value.

S |- <e> : bool     S |- st1 ~> S'     S |- st1 ~> S''   // type-check both branches
----------------------------------------------------
S |- <if e st1 else st2> ~> S       // nested scope doesn't affect parent scope

S |- <e> : bool     S |- <st> ~> S'
----------------------------------
S |- <while e st> ~> S


// functions: functions are call-by-value, have a local storage context, and produce a return value


S' = [(f, fs -> t)] + S  // add function to the current scope
parent(S'') = S'         // define a new scope with the old scope as parent note that the
                         // parent scope has the function definition in it so recursion is acceptable
S'' = [(fs[i].name, fs[i].type)] for all fs[i] in fs // add parameter types to symbol table
S'' |- d* ~> S'''        // add declarations to the symbol table
S''' |- st* ~> S'''      // evaluate statements under the updated symbol table.  note that
                         // statements do not update the symbol table, so it is S''' out as well
----------------------------------  [definition]
S |- <def f(fs) -> t { d* st* }> ~> S'

(t1, ..., tN) -> t = lookup(f, S)
S |- actuals[i] : ti for all i = 1 to N
---------------------------------------------  [call]
S |- <f(actuals)> : t


// lookup

there exists (name, t) in S
------------
S |- lookup(name, S) ~> t

S' = parent(S)
there exists (name, t) in S'
------------
S |- lookup(name, S) ~> t

Type checking III Lecture 9

Table of Contents

Review

Questions about the last class?

Quiz

Quiz Discussion

Handling declarations

Type-checker records type information about each symbol

Symbol tables

How do we handle scoping rules?

Symbol table designs

Single symbol table

Chained symbol tables

Specifying type checkers

Typing judgments

Syntax vs. semantics

Notation

Examples

Deduction via proof rules

Type rules for SimpleC

Type checking III
Lecture 9