COP 3402 meeting -*- Outline -*- * Overview of Symbol Tables and Declaration Checking ** declarations vs. uses This is a key distinction for checking. ------------------------------------------ DECLARATIONS VS. USES Def: a *declaration* of an identifier Def: a *use* of an identifier ------------------------------------------ ... introduces that identifier into the program and states what its attributes are E.g., const, var, and proc declarations in SPL Attributes may be stated implicitly. E.g., the name of a constant or variable, or the offset of a variable from the start of a block or the type in SPL (which is always integer) ... changes or refers to the identifier's value (for a variable, the value stored in the location the variable denotes) E.g., assignment to a variable reference to a constant or variable in an expression call of a proc Q: Are declarations of variables also uses? No! These are mutually exclusive ------------------------------------------ WHICH ARE DECLARATIONS? USES? % file decl-use.spl begin var x, y; % 1 and 2 proc p % 3 begin const y = 4; % 4 call p % 5 end; x := x + y; % 6, 7, and 8 call p % 9 end. ------------------------------------------ Q: Which is a declaration? 1, 2, 3, 4 Q: Which is a use? 5, 6, 7, 8, 9 ** Goals ------------------------------------------ GOALS OF DECLARATION CHECKING Check that: 1. Each declaration has a 2. Each use of an identifier is ------------------------------------------ ... unique name in a potential scope (don't allow duplicate declarations) ... for a name that has already been declared (can't use undeclared identifiers) (so we can know what its attributes are, such as location, type) Q: What should be done if the programming language doesn't have declarations? (Many languages, like Python, Smalltalk, and ML, don't declare types of identifiers, but still have declaration sites, e.g., function parameters or names introduce in let-expressions that must be used to declare names before they are used) What should be done is to record the attributes of a name when it is first used, and accumulate information after that, checking for consistent usage, and at the end check that each used name has been used in a way that defined/declared it. (However, there are some without even that such declarations before uses: Unix bash shell, Wolfram Language for Mathematica, APL... these either create variables on first use or use dynamic scoping or both!) ** What is a symbol table *** definition and data structures ------------------------------------------ SYMBOL TABLE def: a *symbol table* is Data Structures: ------------------------------------------ ... a mapping from identifiers to their attributes (the attributes are typically gathered from their declaration) For simple languages (like assemblers) without any nesting of potential scopes, this is one single mapping that is built for the entire program, but for more complex languages (like SPL, Pascal, Algol, C, C++, Java, etc.), there will be a mapping for each point of the program, that is built up in each potential scope. Q: What kind of data structures/algorithms could be used for a symbol table? A hash table (with hashed lookup), an array or list (with linear search), or a binary search tree (with binary search). Q: Which is easiest to implement? (an array or linked list with linear search...) Q: When used to check a program, which is more common: adding/inserting a new mapping (name, attributes) or looking up the attributes of a name? Lookups are more common in real programs, so they should be optimized to make the compiler run faster, hence a hash table would be ideal. *** Attributes ------------------------------------------ ATTRIBUTES def: an *attribute* is Examples of attributes: ------------------------------------------ ... information kept about a declaration (its name) in a compiler ... source code location of declaration (for error messages), kind of name (constant, variable, procedure, ...) type (for use in type checking), size (for use in allocation, may be derived from the type), offset (an add on to a base address, for use in code generation, may depend on type/size of what is declared before it) **** operations ------------------------------------------ OPERATIONS FOR SYMBOL TABLES void initialize(); int size(); bool empty(); bool full(); unsigned int loc_count(); unsigned int current nesting_level(); bool defined(name); id_use *lookup(name); bool defined_in_current_scope(name); void insert(name, id_attrs); void enter_scope(); void leave_scope(); ------------------------------------------ Notes: in the homework this would go in symtab.h (but note that in hw3, students write this) (and then one would make the names all start with symtab_ ) the name parameters are of type const char * (i.e., strings) the type id_attrs is a type for attributes (provided) the type id_use is a type (provided) that records information about an identifier's use in the program (including the attributes and lexical address) *** interaction with nested scopes **** A program can have multiple potential scopes ------------------------------------------ SYMBOL TABLES AND POTENTIAL SCOPES def: a declaration's *scope* is def: a *potential scope* is Grammar of a SPL program ::= . ::= begin end ::= | ::= ::= const ; ::= { } ::= var ; ::= { } ::= proc ; Example: begin proc p begin var x; x := 2 end; x := 1; % is this use of x declared? call p end. % Example: What's the final value of x? begin var x; proc p begin var x; x := 2 end; proc q begin x := 3 end; x := 1; call q; call p end. ------------------------------------------ ... the area of the program's text where that declaration is effective ... an area of a program's text determined by the nonterminal containing its declaration (such as a block in SPL) Q: Is a block a potential scope? Yes, that is what is a potential scope in SPL Q: What is the final value of the global x in the second SPL program? 3, because p assigns to its local variable x, but q assigns to the global x **** strategies for nested scopes ------------------------------------------ STRATEGIES FOR NESTED SCOPES Needs: Approaches: - [Active Deletion] - [Stack-Based] - [Functional] - [Decorations in AST] ------------------------------------------ - start a new potential scope, so that declarations of (shadowing) names don't generate errors - leave a scope, to make searching happen in the outer scope ... Only 1 table; when finishing scope remove its declarations (can rewalk AST to remove them) ... Stack with one table per scope push when enter scope, pop when leave scope lookup checks lower entries if needed ... (can omit) Pass symbol table down the AST new entries replace ones in scopes if needed, then drop entire symbol table when leave a scope ... (can omit) Use the AST as the symbol table lookup follows the tree structure (upwards) AST nodes for declarations are searched when leave scope, no longer use some declarations (as they are no longer parents in the AST)