COP 3402 meeting -*- Outline -*- * Design for statically-scoped languages ** what is static scoping? ------------------------------------------ STATIC SCOPING def: In *static scoping*, each identifier x def: In *dynamic scoping*, each identifier x ------------------------------------------ ... denotes the location for x declared by the closest textually surrounding declaration of x This is also known as *lexical scoping* Q: Is there another way that identifiers could be found in a program? Yes, it's called dynamic scoping, where each identifier x, ... denotes the location for x declared by the most recent declaration for x that is still active. Q: What kind of scoping is in C, C++, and Java? The Unix shell? C, C++, Java all have static scoping for variables and function names The Unix shell uses dynamic scoping for environment variables! Java uses dynamic scoping for exception handlers ** motivation for static scoping ------------------------------------------ MOTIVATION FOR STATIC SCOPING int incr = 1; int addOne(int y) { return y+incr; } # what does addOne do? int client() { int incr = 2; int z = addOne(3); // what is the value of z here? return z; } ------------------------------------------ Q: What should addOne do? What should the value of z be? does it mean what it meant when it was written or when it was called? With static scoping it means what it meant when it was written, so it adds 1 With dynamic scoping, each name (such as incr) means what it at the time, so it adds 2 Q: Do we want to be able to check programs when we write them? Yes, usually This is even more important in modern (functional) languages where we want to return functions/closures ** block structure ------------------------------------------ BLOCK STRUCTURE def: A *block* is Usual Grammar: Example in C { int next = 3*x+1; next = next / 2; return next; } ------------------------------------------ ... a sequence of local declarations and statements ::= ... | ::= { } ::= | ::= int | ... ::= ------------------------------------------ ADVANTAGES OF BLOCK STRUCTURE - Local storage - Control of names - Easier to extract procedures ------------------------------------------ ... can declare temporary variables whose space is reclaimed when the block is finished e.g., { int huge_array[HUGE]; /* ... */ } ... can pick the best names for variables without worry that they conflict with other code (look locally first), helps independent development within blocks (and thus functions) ... a block can be more easily seen as a procedure, with the body being the block (and the free variables as parameters) (this makes extracting a procedure from code easier) It may help code be easier to read than a language without block structure, since the declarations are closer to their use *** motivation for recursion Q: Do you think recursion is hard to use and understand? (get a show of hands) I find it is very useful, especially for compilers, but it's also useful for other kinds of programs... The files below are in https://www.cs.ucf.edu/~leavens/COP3402/example-code/trees ------------------------------------------ RECURSIVE DATA ==> RECURSIVE PROGRAMS A good rule of design: // file btree.h #include "Tdef.h" typedef struct treeNode { T value; struct treeNode *left, *right; } tree; // helper to compute maximum of its args int max(int a, int b) { return (a >= b) ? a : b; } ------------------------------------------ ... organize the program's structure like the data's structure i.e., program ~ data (in terms of structure) Q: How should we write a program to find the depth of a tree? ... // Return the depth of t int depth (tree *t) { if (t == NULL) { return 0; } else { return 1 + max( depth(t->left), depth(t->right)); } } Q: Is that better than using while loops and an explicit stack? Yes, it's way clearer See the example code page for a non-recursive version... ------------------------------------------ RECURSIVE GRAMMARS Example grammar for statements (not SPL): ::= ... | while () Structure of (recursive descent) parser: // typedef /* ... */ stmtTree; stmtTree *parseStatement() { /* ... */ parseWhileLoop(); /* ... */ } stmtTree *parseWhileLoop() { /* ... */ parseStatement(); /* ... */ } ------------------------------------------ Q: How do we know that this process will terminate? Because each time we recurse, we do so on a smaller text consider while x != 3 do x := x-1 Q: Is it easy to follow what these routines are doing? Yes, it's much harder to implement these without recursion Q: Why are natural languages structured recursively? Seems to be more powerful, it's how our brains work... Other examples: - list manipulation (recursively structured data) - expression evaluation (their grammar is also recursive) - searching directories (which many contain directories) - interpreting or displaying web pages (XML is recursive data) ** Addressing for nested routines *** Problem of addressing locals in block-structured languages ------------------------------------------ HOW TO ADDRESS LOCAL VARIABLES? % file static_scoping.spl: begin proc p0 begin % body of p0 var x; proc p1 begin % body of p1 var y; var z; proc p2 begin % body of p2 var a; begin a := a+x*y+z end end; % of p2 call p2; % ... call p0 % ... end; % of p1 % ... call p1; % ... call p0 % ... end; % of p0 call p0 end. ------------------------------------------ Q: How many calls to p0 and p1 will be on the stack when p2 returns? We can't tell... Q: In the body of p2, how can the compiler find the locations of x and y? This is the problem... For a less artificial example, see quicksort in Pascal: http://sandbox.mc.edu/~bennet/cs404/doc/qsort_pas.html ------------------------------------------ THE PROBLEM Programming language features - subroutines, blocks - nesting of subroutines, blocks - static scoping ==> absolute address of variables is hard to predict statically ------------------------------------------ Q: Can we tell from the text of a program where a routine's ARs will be on the runtime stack? No, if new routines are added, then any calculations will be off... Q: If we don't know where the AR will be on the runtime stack how can local variables in an AR be addressed? Dynamic prediction of where the AR will be is hard and would not be modular, so we don't address local variables using absolute addresses. However, we can know where a variable will be *within* an AR (i.e., its offset) as that is determined by the text of the program (statically) So, we can use an offset from the latest AR of the surrounding scope, if we can find that. *** Compiler response to solve the problem ------------------------------------------ COMPILER-BASED SOLUTION What can compiler know statically about local variable locations? What would we need to find exact location of a local variable? When can an AR be created that needs to know the base of the surrounding AR? Could we pass the base of the AR for the surrounding scope in a call? If each AR stores a (static) link to the (address of the) AR of the surrounding scope, how can we address 2 layers out? 3? What information is needed to address a local variable in a surrounding scope? ------------------------------------------ ... offset from base of AR, since can count allocations from start of routine ... the base for the offset into the AR ... when the surrounding scope either enters a nested block or when it calls down to a routine nested within that scope e.g., begin proc p1 begin var y; var z; proc p2 begin % ... end; call p2 end; call p1 end. Q: Are all calls like that? Depends on the language: e.g., in Haskell a rountine can call another routine in same or surrounding scope ... Yes, this is the *static link* needed to address locals in a surrounding scope So the solution is to pass the address of the base of the AR of the surrounding scope when calling a routine, so it can address those variables This is the "static link" ... follow the static link to the surrounding scope's AR, then follow that scope's link to its surrounding scope's AR... ... the number of levels, and the offset **** Summary: two-part addresses ------------------------------------------ SUMMARY Compilers use two-part addresses, called *lexical addresses* that consist of: ------------------------------------------ 1. Number of levels of surrounding scopes to go outwards (i.e., number or static links to follow) 2. Offset from that scope's base (The order matters in such a pair!) ------------------------------------------ HOW TO ADDRESS LOCAL VARIABLES? % file addressing.spl begin proc p0 begin % body of p0 var x; proc f1 begin % body of f1 var y; var z; proc p2 begin % body of p2 var a; proc f3 begin % body of f3 call f1; % after here ============ x := a + (x*y) + z end; % of f3 call f3 end; % of p2 call p2 end; % f1 call f1 end; % of p0 call p0 end. ------------------------------------------ Q: Following the comment, what is the lexical address of x? (3, 0) Q: What is the lexical address of z? (2, 1) Q: What is the lexical address of a? (1, 0) **** determining the static link to pass ------------------------------------------ WHAT STATIC LINK TO PASS? If block A executes nested block B, what static link to use? If routine R calls E, what static link is passed? ------------------------------------------ ... the current AR's base address, i.e., $fp, since that is A's base address and that is the base for the block surrounding B ... if E has lexical address of form (L,-) then: it is: - The current AR's base address, if L == 0 (this is a call a procedure at the same level as R, so E must be nested within R) - The base address of the AR L levels out, if L > 0 (this is a call to a surrounding procedure) Q: Can a routine R call E if E is defined in a surrounding scope (surrounding R) but if E itself does not surround R? Yes, but the base for the AR surrounding E is still the one that surrounds R. this would look like (in SPL): begin % ... proc E begin % ... end; % ... proc R begin % ... call E % ... end % ... end.