COP 3402 meeting -*- Outline -*-

* Design for statically-scoped languages

** what is static scoping?
------------------------------------------
          STATIC SCOPING

def: In *static scoping*,
     each identifier x


def: In *dynamic scoping*,
     each identifier x


------------------------------------------
     ... denotes the location for x declared by the closest textually
         surrounding declaration of x

         This is also known as *lexical scoping*

         Q: Is there another way that identifiers could be found in a
         program?
         Yes, it's called dynamic scoping, where each identifier x,
     ... denotes the location for x declared by the most recent
         declaration for x that is still active.

     Q: What kind of scoping is in C, C++, and Java?  The Unix shell?
     C, C++, Java all have static scoping for variables and function names
     The Unix shell uses dynamic scoping for environment variables!
     Java uses dynamic scoping for exception handlers

** motivation for static scoping
------------------------------------------
       MOTIVATION FOR STATIC SCOPING

 int incr = 1;
 int addOne(int y) { return y+incr; }

 # what does addOne do?

 int client() {
    int incr = 2;
    int z = addOne(3);
    // what is the value of z here?
    return z;
 }

------------------------------------------

     Q: What should addOne do? What should the value of z be?
        does it mean what it meant when it was written
        or when it was called?

        With static scoping it means what it meant when it was written, 
        so it adds 1

        With dynamic scoping, each name (such as incr) means what it
        at the time, so it adds 2

     Q: Do we want to be able to check programs when we write them?
     Yes, usually

     This is even more important in modern (functional) languages
     where we want to return functions/closures

** block structure
------------------------------------------
       BLOCK STRUCTURE


def: A *block* is


Usual Grammar:


Example in C

   {
      int next = 3*x+1;
      next = next / 2;
      return next;
   }
------------------------------------------
        ... a sequence of local declarations
        and statements

        <stmt> ::= ... | <block>
        <block> ::= { <decls> <stmts> }
        <decls> ::= <empty>
                    | <declaration> <decls>
        <declaration> ::= int <name> | ...
        <stmts> ::= <stmt> <stmts>

------------------------------------------
       ADVANTAGES OF BLOCK STRUCTURE

 - Local storage


 - Control of names


 - Easier to extract procedures


------------------------------------------

        ... can declare temporary variables whose space is reclaimed
            when the block is finished

        e.g., { int huge_array[HUGE]; /* ... */ }

        ... can pick the best names for variables
            without worry that they conflict with other code
            (look locally first),
            helps independent development within blocks (and thus functions)

        ... a block can be more easily seen as a procedure,
            with the body being the block
            (and the free variables as parameters)
            (this makes extracting a procedure from code easier)

        It may help code be easier to read
        than a language without block structure,
        since the declarations are closer to their use

*** motivation for recursion
    Q: Do you think recursion is hard to use and understand?
    (get a show of hands)
    I find it is very useful, especially for compilers,
    but it's also useful for other kinds of programs...

    The files below are in
    https://www.cs.ucf.edu/~leavens/COP3402/example-code/trees

------------------------------------------
   RECURSIVE DATA ==> RECURSIVE PROGRAMS

A good rule of design:


// file btree.h
#include "Tdef.h"

typedef struct treeNode {
    T value;
    struct treeNode *left, *right;
} tree;

// helper to compute maximum of its args
int max(int a, int b) {
    return (a >= b) ? a : b;
}


------------------------------------------
   ...
   organize the program's structure
   like the
   data's structure

   i.e., program ~ data (in terms of structure)

    Q: How should we write a program to find the depth of a tree?
        ...

// Return the depth of t
int depth (tree *t) {
    if (t == NULL) {
	return 0;
    } else {
	return 1 + max(
	   depth(t->left),
	   depth(t->right));
    }
}

        Q: Is that better than using while loops and an explicit stack?
        Yes, it's way clearer
        See the example code page for a non-recursive version...
        
------------------------------------------
         RECURSIVE GRAMMARS

Example grammar for statements (not SPL):

<stmt> ::= ...
     | while (<expr>) <stmt>


Structure of (recursive descent) parser:
// typedef /* ... */ stmtTree;

stmtTree *parseStatement() {
     /* ... */
     parseWhileLoop();
     /* ... */
}

stmtTree *parseWhileLoop() {
    /* ... */
    parseStatement();
    /* ... */
}
------------------------------------------
        Q: How do we know that this process will terminate?
        Because each time we recurse, we do so on a smaller text
                consider while x != 3 do x := x-1

        Q: Is it easy to follow what these routines are doing?
        Yes, it's much harder to implement these without recursion

        Q: Why are natural languages structured recursively?
        Seems to be more powerful, it's how our brains work...

        Other examples:
          - list manipulation (recursively structured data)
          - expression evaluation (their grammar is also recursive)
          - searching directories (which many contain directories)
          - interpreting or displaying web pages (XML is recursive data)

** Addressing for nested routines
*** Problem of addressing locals in block-structured languages
------------------------------------------
     HOW TO ADDRESS LOCAL VARIABLES?

% file static_scoping.spl:
begin
  proc p0
  begin % body of p0
    var x;
    proc p1
    begin % body of p1
      var y;
      var z;
      proc p2
        begin % body of p2
          var a;
          begin
            a := a+x*y+z
	  end
        end; % of p2
      call p2;
      % ...
      call p0
      % ...
    end; % of p1
    % ...
    call p1;
    % ...
    call p0
    % ...
  end; % of p0
  call p0
end.
------------------------------------------
        Q: How many calls to p0 and p1 will be on the stack
           when p2 returns?
        We can't tell...

        Q: In the body of p2, how can the compiler find
           the locations of x and y?
        This is the problem...

        For a less artificial example, see quicksort in Pascal:
        http://sandbox.mc.edu/~bennet/cs404/doc/qsort_pas.html

------------------------------------------
       THE PROBLEM

Programming language features
    - subroutines, blocks
    - nesting of subroutines, blocks
    - static scoping

  ==> absolute address of variables
      is hard to predict statically

------------------------------------------
        Q: Can we tell from the text of a program where
           a routine's ARs will be on the runtime stack?
        No, if new routines are added,
            then any calculations will be off...
        Q: If we don't know where the AR will be
           on the runtime stack
           how can local variables in an AR be addressed?
        Dynamic prediction of where the AR will be is hard
           and would not be modular,
           so we don't address local variables using absolute
           addresses.
        However, we can know where a variable
        will be *within* an AR (i.e., its offset)
        as that is determined by the text of the program (statically)
        So, we can use an offset from the latest AR
        of the surrounding scope, if we can find that.
        
*** Compiler response to solve the problem
------------------------------------------
       COMPILER-BASED SOLUTION

What can compiler know statically
     about local variable locations?


What would we need to find exact location
     of a local variable?


When can an AR be created that needs to
     know the base of the surrounding AR?


Could we pass the base of the AR for the
    surrounding scope in a call?


If each AR stores a (static) link to the
   (address of the)
   AR of the surrounding scope,
   how can we address 2 layers out? 3?


What information is needed to address
  a local variable in a surrounding scope?


------------------------------------------
      ... offset from base of AR,
          since can count allocations from start of routine

      ... the base for the offset into the AR


      ... when the surrounding scope either enters a nested block
          or when it calls down to a routine
          nested within that scope
          e.g.,
            begin
              proc p1
              begin
                var y;
                var z;
                proc p2
                begin
                  % ...
                end;
                call p2
              end;
              call p1
	    end.

      Q: Are all calls like that?
         Depends on the language:
           e.g., in Haskell a rountine can call another routine in same
                 or surrounding scope

      ... Yes, this is the *static link* needed
          to address locals in a surrounding scope

      So the solution is to pass the address of the
          base of the AR
          of the surrounding scope when calling a routine,
          so it can address those variables
          This is the "static link"

      ... follow the static link to the surrounding scope's AR,
          then follow that scope's link to its
          surrounding scope's AR...

      ... the number of levels, and
          the offset
      
**** Summary: two-part addresses
------------------------------------------
          SUMMARY

Compilers use two-part addresses,
     called *lexical addresses*
     that consist of:
   

------------------------------------------

        1. Number of levels of surrounding scopes
           to go outwards
            (i.e., number or static links to follow)
        2. Offset from that scope's base

        (The order matters in such a pair!)
        
------------------------------------------
     HOW TO ADDRESS LOCAL VARIABLES?

% file addressing.spl
begin
  proc p0
  begin % body of p0
    var x;
    proc f1
    begin % body of f1
      var y;
      var z;
      proc p2
      begin % body of p2
        var a;
        proc f3
	begin % body of f3
          call f1;
          % after here ============
          x := a + (x*y) + z
        end; % of f3
        call f3
      end; % of p2
      call p2
    end; % f1
    call f1
  end; % of p0
  call p0
end.
------------------------------------------

        Q: Following the comment,
           what is the lexical address of x?
           (3, 0)
        Q: What is the lexical address of z?
           (2, 1)
        Q: What is the lexical address of a?
           (1, 0)


**** determining the static link to pass
------------------------------------------
         WHAT STATIC LINK TO PASS?

If block A executes nested block B,
   what static link to use?

If routine R calls E,
   what static link is passed?


------------------------------------------
       ... the current AR's base address, i.e., $fp,
           since that is A's base address and that is the base
           for the block surrounding B
       
       ... if E has lexical address of form
          (L,-) then: it is:

            - The current AR's base address, if L == 0
                (this is a call a procedure at the same level as R,
                 so E must be nested within R)
            - The base address of the AR L levels out, if L > 0
                (this is a call to a surrounding procedure)

       Q: Can a routine R call E if E is defined in a surrounding
          scope (surrounding R) but if E itself does not surround R?
          Yes, but the base for the AR surrounding E is still the
          one that surrounds R.

         this would look like (in SPL):
            begin
              % ...
              proc E
              begin
                % ...
              end;
              % ...
                   proc R
                   begin
                     % ...
                     call E
                     % ...
                   end
              % ...
            end.