COP 3402 meeting -*- Outline -*-

* Overview of Symbol Tables and Declaration Checking

** declarations vs. uses
   This is a key distinction for checking.
------------------------------------------
          DECLARATIONS VS. USES

Def: a *declaration* of an identifier


Def: a *use* of an identifier


------------------------------------------
  ... introduces that identifier into the program
      and states what its attributes are

      E.g., const, var, and proc declarations in SPL

      Attributes may be stated implicitly.
       E.g., the name of a constant or variable,
             or the offset of a variable from the start of a block
             or the type in SPL (which is always integer)

  ... changes or refers to the identifier's value
      (for a variable, the value stored in the location the variable denotes)
      E.g., assignment to a variable
            reference to a constant or variable in an expression
            call of a proc

      Q: Are declarations of variables also uses?
         No! These are mutually exclusive

------------------------------------------
      WHICH ARE DECLARATIONS? USES?

% file decl-use.spl
begin
   var x, y;   % 1 and 2
   proc p      % 3
   begin
     const y = 4;  % 4
     call p    % 5
   end;
   x := x + y; % 6, 7, and 8
   call p      % 9
end.
------------------------------------------
    Q: Which is a declaration?
        1, 2, 3, 4
    Q: Which is a use?
        5, 6, 7, 8, 9

** Goals
------------------------------------------
        GOALS OF DECLARATION CHECKING

Check that:

  1. Each declaration has a 


  2. Each use of an identifier is
   

------------------------------------------
      ... unique name in a potential scope
          (don't allow duplicate declarations)

      ... for a name that has already been declared
          (can't use undeclared identifiers)
          (so we can know what its attributes are, such as location, type)

      Q: What should be done if the programming language
         doesn't have declarations?

         (Many languages, like Python, Smalltalk, and ML,
         don't declare types of identifiers, but still
         have declaration sites, e.g., function parameters or names
         introduce in let-expressions that must be used to declare
         names before they are used)

         What should be done is to record the attributes of a name
         when it is first used, and accumulate information after that,
         checking for consistent usage, and at the end check that each
         used name has been used in a way that defined/declared it.

         (However, there are some without even that such declarations
          before uses: Unix bash shell, Wolfram Language for
          Mathematica, APL... these either create variables on
          first use or use dynamic scoping or both!)

** What is a symbol table
*** definition and data structures
------------------------------------------
          SYMBOL TABLE

def: a *symbol table* is


Data Structures:


------------------------------------------
   ...  a mapping from identifiers to their attributes
           (the attributes are typically gathered from their declaration)

        For simple languages (like assemblers) without any nesting of
        potential scopes, this is one single mapping that is built for the
        entire program,
        but for more complex languages (like SPL, Pascal, Algol, C, C++,
        Java, etc.), there will be a mapping for each point of the
        program, that is built up in each potential scope.

        Q: What kind of data structures/algorithms could be used
           for a symbol table?
        A hash table (with hashed lookup),
        an array or list (with linear search), or
        a binary search tree (with binary search).

        Q: Which is easiest to implement?
        (an array or linked list with linear search...)

        Q: When used to check a program, which is more common:
        adding/inserting a new mapping (name, attributes) or
        looking up the attributes of a name?

        Lookups are more common in real programs,
        so they should be optimized to make the compiler run faster,
        hence a hash table would be ideal.

*** Attributes
------------------------------------------
           ATTRIBUTES

def: an *attribute* is


Examples of attributes:


------------------------------------------
        ... information kept about a declaration (its name) in a compiler

        ... source code location of declaration (for error messages),
            kind of name (constant, variable, procedure, ...)
            type (for use in type checking),
            size (for use in allocation, may be derived from the type),
            offset (an add on to a base address, for use in code generation,
                    may depend on type/size of what is declared before it)

**** operations
------------------------------------------
       OPERATIONS FOR SYMBOL TABLES


void initialize();


int size();


bool empty();


bool full();


unsigned int loc_count();


unsigned int current nesting_level();


bool defined(name);


id_use *lookup(name);


bool defined_in_current_scope(name);


void insert(name, id_attrs);


void enter_scope();


void leave_scope();

------------------------------------------
        Notes:
         in the homework this would go in symtab.h
          (but note that in hw3, students write this)
          (and then one would make the names all start with symtab_ )
         the name parameters are of type const char * (i.e., strings)
         the type id_attrs is a type for attributes (provided)
         the type id_use is a type (provided) that records information
             about an identifier's use in the program
             (including the attributes and lexical address)

*** interaction with nested scopes
**** A program can have multiple potential scopes
------------------------------------------
   SYMBOL TABLES AND POTENTIAL SCOPES

def: a declaration's *scope* is


def: a *potential scope* is


Grammar of a SPL program

<program> ::= <block> .
<block> ::= begin
              <const-decls> <var-decls>
              <proc-decls> <stmts>
            end
<const-decls> ::= <empty>
              | <const-decls> <const-decl>
<empty> ::= 
<const-decl> ::= const <const-def-list> ;

<var-decls> ::= { <var-decl> }
<var-decl> ::= var <ident-list> ;

<proc-decls> ::= { <proc-decl> }
<proc-decl> ::= proc <ident> <block> ;


Example:

begin
  proc p
  begin
    var x;
    x := 2
  end;
  x := 1; % is this use of x declared?
  call p
end.


% Example: What's the final value of x?
begin
  var x;
  proc p
  begin
    var x;
    x := 2
  end;
  proc q
  begin
    x := 3
  end;
  x := 1;
  call q;
  call p
end.
 
------------------------------------------
        ... the area of the program's text where that declaration is effective

        ... an area of a program's text determined by the nonterminal
            containing its declaration (such as a block in SPL)
            
        Q: Is a block a potential scope?
        Yes, that is what is a potential scope in SPL

        Q: What is the final value of the global x in the second SPL program?
           3, because p assigns to its local variable x,
           but q assigns to the global x
        
**** strategies for nested scopes
------------------------------------------
      STRATEGIES FOR NESTED SCOPES

Needs:


Approaches:
 - [Active Deletion]


 - [Stack-Based]


 - [Functional]


 - [Decorations in AST]


------------------------------------------
        - start a new potential scope,
          so that declarations
          of (shadowing) names don't generate errors

        - leave a scope,
          to make searching happen
          in the outer scope

        ... Only 1 table;
            when finishing scope remove its declarations
             (can rewalk AST to remove them)

        ... Stack with one table per scope
              push when enter scope,
              pop when leave scope
              lookup checks lower entries if needed

        ... (can omit)
            Pass symbol table down the AST
              new entries replace ones in scopes if needed,
              then drop entire symbol table when leave a scope
        
        ... (can omit)
            Use the AST as the symbol table
              lookup follows the tree structure (upwards)
              AST nodes for declarations are searched
              when leave scope, no longer use some declarations
                 (as they are no longer parents in the AST)