Type-Checking
Lecture 8
Table of Contents
- Overview
- Why use types?
- Type vs. untyped languages
- Type-safe vs. unsafe
- When do we check types?
- Execution errors and well-behaved programs
- Demo: C vs. Python
- Static type checking
- Function types
- Safety guarantees
- Proving type soundness
- Implementation
- Symbol table
- Demo: statically checking a tree
- Project
Overview
- What are types?
- Why have them?
- How to implement them?
Why use types?
One use: to prevent errors during runtime
How do you feel about types? Do you like having the protection? Do compiler errors bother you?
Type vs. untyped languages
- A type is
- a set of values
- and operations on those values
Examples
- int
- the set of integers and the arithmetic operations
- bool
- { true, false } and the logic operators (and, or, not)
Typed languages
Restrict a variable's range of possible values (Python, C, Java, etc.)
Untyped languages
Do not restrict variable values (Lisp, assembly)
Type-safe vs. unsafe
- Runtime errors are either trapped or untrapped
- Trapped
- Machine catches and terminates program, e.g., NULL-pointer error, divide-by-zero
- Untrapped
- Program continues, e.g., writing past array bounds, integer arithmetic on floating point number
Untrapped are nefarious, because you may not noticed until program has wrong behavior in some tested input of the program.
A safe language
Prevents untrapped errors (and some trapped errors).
When do we check types?
- Compile-time (static): C, Java
- Run-time (dynamic): Python, Java(?)
Statically-checked vs dynamically checked?
Java employs static and dynamic checks, e.g.,
- checking whether a symbol is an array: static check
- checking whether an array access is out-of-bounds: runtime check
Java reflection: check type at run-time (instanceof)
Why not always do static checks?
What else can we define in the type system?
- Memory-safety: pointer is never dereferenced when NULL
- Information flow security: does a secret value ever get printed out?
Execution errors and well-behaved programs
- Forbidden errors: all untrapped errors (and some trapped)
- Good behavior: a program has no foridden behaviors
Weak vs. strong checking
- Strongly-checked: all legal programs have good behavior
- Weakly-checked: some programs violate safety
Why would we want weak typing?
What the trade-off between strong/weak typing and decisions about what to checK?
Source: http://lucacardelli.name/Papers/TypeSystems.pdf
Cardelli considers Lisp untyped, because it does not restrict variables to a range of values. Untyped languages "do not have types or, equivalently, have a single universal type that contains all values. In these languages, operations may be applied to inappropriate arguments: the result may be a fixed arbi- trary value, a fault, an exception, or an unspecified effect."
Demo: C vs. Python
Static type checking
- Record types of identifiers in symbol table
- Post-order tree traversal
- Check identifiers used in
- Arithmetic operators, function calls, assignments
- Lookup type in symbol
- Constants have a fixed type
- 3 is an int
- 5.2 is a float
- True is bool (though C itself has no bool)
Function types
- Scalar values have primitive type
- int, char, long, etc.
If symbol "x" has type "int", we can write
x : int
- Function types describe parameters, return values
E.g., f takes two integers and returns a bool
f : (int, int) -> bool
What is the type of a arithmetic multiplication (
*
)?* : (int, int) -> int
Safety guarantees
- If type checker accepts a program is it actually safe?
- type soundness: checkers says safe, then program is safe
- Example: array out of bounds access
- unsound: C type checker accepts the program
- sound: Java type checker rejects the program (at runtime)
Proving type soundness
- Goal: well-typed program are safe program
- Need to define semantics first
- Define type rules that "run" over the semantics
Formal soundess: each provable sentence (well-typed program) is valid with respect to semantics (safe program)
Implementation
Symbol table
Mapping variables to types and memory locations
Example symbol table
main { char x; int res; x = 'a'; res = 5; return 0; }
symbol | type |
---|---|
x | char |
res | int |
Demo: statically checking a tree
int x; int y; return x+y;
int x char c; return x + c;
int x; int y; int z; return x + y * z;
- Traverse the tree
- Apply different rule for each type of node
- Declaration nodes: add to symbol table
- Developer annotations prime the compiler with the type operations to expect
- Expression and assignment nodes: check against types in the symbol table
- Constants have predefined types
- Operators have predefined (function) types