COP 3402 meeting -*- Outline -*- * Overview of Symbol Tables and Declaration Checking ** Goals ------------------------------------------ GOALS OF DECLARATION CHECKING Check that: 1. Each declaration has a 2. Each use of an identifier is ------------------------------------------ ... unique name in a potential scope (don't allow duplicate declarations) ... for a name that has already been declared (can't use undeclared identifiers) (so we can know what its attributes are, such as location, type) Q: What should be done if the programming language doesn't have declarations? (Many languages, like Python, Smalltalk, and ML, don't declare types of identifiers, but still have declaration sites, e.g., function parameters or names introduce in let-expressions that must be used to declare names before they are used, but there are some without even that:) (E.g., Unix bash shell, Wolfram Language for Mathematica, APL) What should be done is to record the attributes of a name when it is first used, and accumulate information after that, checking for consistent usage. ** What is a symbol table *** definition and data structures ------------------------------------------ SYMBOL TABLE def: a *symbol table* is Data Structures: ------------------------------------------ a mapping from identifiers to their attributes For simple languages (like assemblers) without any nesting of potential scopes, this is one mapping that is built for the entire program, but for more complex languages (like PL/0, Pascal, Algol, C, C++, Java, etc.), there will be a mapping for each point of the program, that is built up in each potential scope. Q: What kind of data structures/algorithms could be used for a symbol table? A hash table (with hashed lookup), an array or list (with linear search), or a binary search tree (with binary search). Q: Which is easiest to implement? (an array or linked list with linear search...) Q: When used to check a program, which is more common: adding/inserting a new mapping (name, attributes) or looking up the attributes of a name? Lookups are more common in real programs, so they should be optimized to make the compiler run faster, hence a hash table would be ideal. *** Attributes ------------------------------------------ ATTRIBUTES def: an *attribute* is Examples of attributes: ------------------------------------------ ... information kept about a declaration (name) in a compiler ... source code location of declaration (for error messages), kind of name (constant, variable, procedure, ...) type (for use in type checking), size (for use in allocation, may be derived from the type), offset (for use in code generation, may depend on type/size) **** operations ------------------------------------------ OPERATIONS FOR SYMBOL TABLES void initialize(); int size(); bool full(); bool defined(name); id_use *lookup(name); bool defined_in_current_scope(name); void insert(name, id_attrs); void enter_scope(); void leave_scope(); ------------------------------------------ Note: in the homework this would go in symtab.h (students write this) (and I would make the names all start with symtab_ ) the name parameters are of type const char * (i.e., strings) the type id_attrs is a type for attributes (provided) the type id_use is a type (provided) that records information about an identifier use *** interaction with nested scopes **** A program can have multiple potential scopes ------------------------------------------ SYMBOL TABLES AND POTENTIAL SCOPES def: a declaration's *scope* is def: a *potential scope* is Grammar of a PL/0 program ::= . ::= { } ::= procedure ; ; Example: procedure p; var x; x := 2; begin x := 1; # is this use declared? call p end. Example: What's the final value of x? var x; procedure p; var x; x := 2; procedure q; x := 3; begin x := 1; call q; call p; end. ------------------------------------------ ... the area of the program's text where that declaration is effective ... an area of a program's text determined by a nonterminal such as a block Q: Is a block a potential scope? Yes, that is what is a potential scope in PL/0. Q: What is the final value of the global x in this program? 3, because p assigns to its local variable x, but q assigns to the global x **** strategies for nested scopes ------------------------------------------ STRATEGIES FOR NESTED SCOPES Needs: Approaches: - [Active Deletion] - [Stack-Based] - [Functional] - [Decorations in AST] ------------------------------------------ - start a new potential scope, so that declarations of (shadowing) names don't generate errors - leave a scope, to make searching happen in the outer scope ... Only 1 table; when finishing scope remove its declarations (can rewalk AST to remove them) ... Stack with one table per scope push when enter scope, pop when leave scope lookup checks lower entries if needed ... (can omit) Pass symbol table down the AST new entries replace ones in scopes if needed, then drop entire symbol table when leave a scope ... (can omit) Use the AST as the symbol table lookup follows the tree structure (upwards) AST nodes for declarations are searched when leave scope, no longer use some declarations (as they are no longer parents in the AST)