GOALS FOR SUBROUTINES

subroutine = function or procedure
             (abstraction of expressions
             or commands)

Want:
    - independent development
    - information hiding
    - maximal reuse
    - efficient execution


    WHAT A SUBROUTINE CALL DOES

What are the steps in a subroutine call
    like:   x = f(E1,E2)
    where E1 and E2 are expressions?

   evaluate actual arguments E1 and E2
      - put values someplace
        (where f can find them)
   - save the address of the next statement
      (called the return address)
   - jumps to the address of f
      jump and link instruction
        which saves the return address
   - allocate memory to run f
     (local variables, including parameters)
   - run code of f
   - put the result value somewhere
      (where the caller can find it)
   - (restore registers, if needed)
   - jump to return address (jr instruction)

   - assign the value returned to x


        SRM = SIMPLIFIED RISC MACHINE

RISC = Reduced Instruction Set Computer
  (vs. CISC = Complex Instruction Set ...)

SRM, for HW, based on MIPS processor

Register-based instructions:

  ADD s,t,d  is  GPR[d] <- GPR[s] + GPR[t]

 ADDI s,t,i  is  GPR[t] <- GPR[s] + i

Byte-addressible:

  LBU b,t,o  is GPR[t] <- memory[GPR[b]+o]


       32 REGISTERS IN THE SRM

Num Notes
0   always 0 (can't write this!)
1   assembler's temporary
2   function result                ($v0)    
3   function result                ($v1)
4   function argument              ($a0)
5   function argument              ($a1)
6   function argument              ($a2)
7   function argument              ($a3)
8-15 temporary             ($t0,...,$t7)
16-23 temporary                ($s0,...)
24-25 temporary               ($t8, $t9)
26  (reserved for OS)
27  (reserved for OS)
28  global, static data            ($gp)
29  stack pointer                  ($sp)
30  frame pointer                  ($fp)
31  return address                 ($ra)

       CALLS AND REGISTERS

Call done by "Jump and Link" instruction

    LW $a0, x      # load argument x
    LW $a1, y      # load argument y
    (save any registers necessary,
    at least ra = register 31)
    JAL f
       $ra <- PC   # save return addr
       PC <- f     # jump to f

    ...

    JR $ra         # jump to return addr
    

     WHO SAVES AND RESTORES REGISTERS?

Problem: limited number of registers
      - caller might want some preserved
      - callee might change some

So if registers are saved by:

 Callee, then might need to save
        some registers that it may modify,
        but that the caller
	doesn't need (wasted work)

 Caller, then might need to save 
          some registers that it has
	  values it needs in,
	  but that the callee doesn't change
	  (also wasted)

 So, use a hybrid approach:
    - caller saves some registers,
       if needs them
    - callee saves some registers,
       if it modifies them


          CALLING CONVENTION

Agreement between all callers and callees

 Callee saves & restores:
    return address,
    stack/frame pointers (always),
    and the s- registers,
    if it could modify them

 Caller saves & restores:
    argument registers,
    temporaries (t- registers) needed
    

        IT'S A CONVENTION / AGREEMENT

Saving and restoring not enforced


    VM FEATURES TO SUPPORT SUBROUTINES

- call and return instructions
   need to save the return address somewhere
     (in SRM it is saved in register 31)
- support for recursive subroutines
    - different calls (activations)
      need their own space (for locals)
    - different activations may coexist
      at runtime

// example of a recursive function:

// Requires: 0 <= n < 20
// Return the factorial of n
long fact(int n) {
   int nm1res;
   if (n > 0) {
      nm1res = fact(n-1)
      return n * nm1res;
   } else {
      return 1;
   }
}

    - the local storage will include:
        - the formal parameter(s)
	- locally declared variables

    - support for static scoping
       to understand routines where
       they are written


          STATIC SCOPING

def: In *static scoping*,
     each identifier x
     denotes the location for x declared
     in the closest textually surrounding
     area of code (scope)

def: In *dynamic scoping*,
     each identifier x
     denotes the location for x declared
     (by the most recent activation of
     a routine) that is still active

def: scope is an area of program text
     where a declaration is effective


       MOTIVATION FOR STATIC SCOPING

 int incr = 1;
 int addOne(int y) { return y+incr; }

 int client() {
    int incr = 2;
    int z = addOne(3);
    // what is the value of z here?
    return z;
 }


       BLOCK STRUCTURE


def: A *block* is a sequence of
    statements and declarations


Usual Grammar:

  <stmt> ::=  ... | <block>
  <block> ::= { <decls> <stmts> }
  <decls> ::= <empty>
            | <declaration> <decls>
  <declaration> ::= int <name> ; | ...
  <stmts> ::= <stmt> <stmts>
  
    { y = 3;
      int x = y+1;
      z = x+y;
    }
 ==>
    { int x;
      y = 3;
      x = y+1;
      z = x+y;
    }
      
Example in C

   {
      int next = 3*x+1;
      next = next / 2;
      return next;
   }

       ADVANTAGES OF BLOCK STRUCTURE

 - Local storage
   can declare temporary variables
   and their space is reclaimed when
   the block is done

    { int huge_array[HUGE]; /* ... */ }

 - Control of names
     can pick the names for variables
     without worry that they conflict
     with outher code
     ==> allows independent development
     ==> easier to read code
          as declarations can be
	  close to use(s)

 - Easier to extract procedures
    a block can be seen as a procedure body
    and its free variables can be parameters

   { // block A
      { // block B
         { // block C
	 }
      }
   }


   RECURSIVE DATA ==> RECURSIVE PROGRAMS

A good rule of design:
    organize the program's structure
    like the
    data's structure

// file btree.h
#include "Tdef.h"

typedef struct treeNode {
    T value;
    struct treeNode *left, *right;
} tree;

// helper to compute maximum of its args
int max(int a, int b) {
    return (a >= b) ? a : b;
}

// Requires: t != NULL
// Return the depth of t
int depth(tree *t) {
  if (t == NULL) {
     return 0;
  } else {
     return 1 + max(depth(t->left),
                    depth(t->right));
  }
}


         RECURSIVE GRAMMARS

Grammar for statements:

<stmt> ::= ...
     | while (<expr>) <stmt>


Structure of a parser:
// typedef /* ... */ stmtTree;

stmtTree *parseStatement() {
     /* ... */
     parseWhileLoop();
     /* ... */
}

stmtTree *parseWhileLoop() {
    /* ... */
    parseStatement();
    /* ... */
}

        VM DESIGN FOR SUBROUTINES

For each call:
  - storage for a subroutine's variables
    (local storage)
    organized as a stack

  - storage for a single call is called
    an Activation Record (AR):

def: An *activation record* (AR)
     is a part of the runtime stack
     that holds local storage
     for one call of a subroutine

Memory Organization

        stack   (higher addresses)
         |
	 v

         ^
         |
        heap
   global static storage (data section)
       text (code of program)
                 (lower addresses)


            STACK ORGANIZATION

In the code: P calls Q, Q calls R

Initially:

        [        0 ]

Call of subroutine P:

        [ AR for P ]


After P calls Q:

        [ AR for P ]
        [ AR for Q ]


After Q calls R:

        [ AR for P ]
        [ AR for Q ]
        [ AR for R ]

After R returns:

        [ AR for P ]
        [ AR for Q ]

After Q returns:

        [ AR for P ]

After P returns:


           STACK IMPLEMENTATION

AR delimited by two indexes:

   - fp: frame pointer
         points to byte address of first element of AR


   - sp: stack pointer
         points to byte address of the top element of the AR


Notes,
   assuming stack is byte-addressed,
   and grows towards lower addresses (down)


     STACK IMPLEMENTATION: HEADER FILES

Assume:
  - stack is byte addressed
  - stack grows down
    towards lower addresses

// File: machine_types.h
#ifndef _MACHINE_TYPES_H
#define _MACHINE_TYPES_H

// type of addresses
typedef unsigned int address_type;

// type of machine bytes
typedef unsigned char byte_type;

// type of machine words
typedef int word_type;

#define BYTES_PER_WORD 4

#endif

// File: stack.h
#ifndef _STACK_H
#define _STACK_H
#include <stdbool.h>
#include <stdio.h>
#include "machine_types.h"

// The MAX_STACK_HEIGHT must be
// evenly divisible by BYTES_PER_WORD
#define MAX_STACK_HEIGHT 2048

// Initialize the stack data structure
extern void stack_initialize();

/* ... other extern declarations ... */
#endif

      STACK IMPLEMENTATION: STACK.C FILE

/* $Id: stack.c,v 1.3 2023/09/08 ... */
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include "utilities.h"
#include "stack.h"

// size of the stack in words
#define STACK_LEN (MAX_STACK_HEIGHT/BYTES_PER_WORD)

// the stack's storage
static word_type memory[STACK_LEN];

// first index of current AR, in bytes
static int fp;

// index of top element, in bytes
static int sp;

// the stack's invariant
void stack_okay()
{
    assert(fp >= sp);
    assert(0 <= fp);
    assert(fp < STACK_LEN);
    assert(sp >= 0);
    assert(fp % BYTES_PER_WORD == 0);
    assert(sp % BYTES_PER_WORD == 0);
}

// Initialize the stack data structure
void stack_initialize()
{
   fp = MAX_STACK_HEIGHT - BYTES_PER_WORD;
   sp = fp;
   stack_okay();
   // initalize the storage
   for (int i = fp/BYTES_PER_WORD;
	0 =< i;
	i--)
   {
      memory[i] = (word_type) 0;
   }
}

// Return the stack's num. of bytes
int stack_size() {
    return (MAX_STACK_HEIGHT
	    - BYTES_PER_WORD)
	    - sp;
}

// Return the current AR's num. of bytes
int stack_AR_size() { return fp - sp; }

// Return the address of the base
// of the current AR (fp value)
address_type stack_AR_base() {
    return fp;
}

// Is the stack empty?
bool stack_empty() {
    return stack_size() == 0;
}

// Is the stack full?
bool stack_full() { return sp <= 0; }

// Requires: BYTES_PER_WORD > j >= 0;
// get the jth byte of the word v
// (numbered from the right)
static byte_type fetchByteFromWord(
                  word_type v, int j) {
    return (v >> (j*8)) & 0x000000FF;
}

// Requires: !stack_full()
// push a word on the stack
// with sp becoming old(sp) - BYTES_PER_WORD
void stack_push_word(word_type val) {
    stack_okay();
    sp = sp - BYTES_PER_WORD;
    if (sp > 0) {
	// report an error
	exit(2);
    }
    memory[sp/BYTES_PER_WORD] = val;
    stack_okay();
}


// Requires: n is evenly divisible
//           by BYTES_PER_WORD
// Requires: (stack_size() + n)
//     < (MAX_STACK_HEIGHT/BYTES_PER_WORD)
// Increase the size of the stack by n
void stack_allocate_bytes(unsigned int n)
{


}


// Requires: !stack_empty()
// pop the stack and return the top word.
// The size of the stack is
// reduced by BYTES_PER_WORD.
word_type stack_pop_word()
{


}

// Requires: n is evenly divisible
//           by BYTES_PER_WORD
// Requires: (stack_size() - n) >= 0
// Decrease the size of the stack by n bytes
void stack_deallocate_bytes(unsigned int n)
{


}

// Requires: !stack_empty()
// pop the stack and return the top word.
// The size of the stack is
// reduced by BYTES_PER_WORD.
word_type stack_pop_word()
{


}

// Requires: !stack_empty()
// return the top word without popping
word_type stack_top_word()
{


}

// translate a byte address to
// a word address (put in word_addr)
// and a byte offset (put in byte_offset)
void stack_byte2word_address(
	      int addr, int *word_addr,
	      int *byte_offset)
{


}

// Requires: BYTES_PER_WORD > j >= 0;
// set the jth byte of the word v (numbered from the right)
// to the given value bv
static word_type setByteInWord(word_type x, int j, byte_type bv) {


}

// Requires: stack_top_AR_word_addr()
//                   <= word_addr
// Requires: word_addr < STACK_LEN
// Requires: 0 <= byte_offset
// Requires: byte_offset < BYTES_PER_WORD
// set the byte_offset'th byte of
// the stack's storage at word_addr to bv
void stack_set_byte_at_word_offset(
	    int word_addr,
	    int byte_offset,
	    byte_type bv)
{


}

// Requires: stack_top_AR_address()
//           - BYTES_PER_WORD <= addr
// Requires: addr < STACK_MAX_HEIGHT;
// Set the byte at addr to bv
void stack_set_byte(address_type addr,
		    byte_type bv)
{


}

// Requires: 0 <= word_addr
// Requires: word_addr < STACK_LEN
// Requires: 0 <= byte_offset
// Requires: byte_offset < BYTES_PER_WORD
// Return the byte_offset'th byte of
// the stack's storage at word_addr
byte_type stack_get_byte_at_word_offset(
	    int word_addr,
	    int byte_offset)
{


}

// Requires: sp - BYTES_PER_WORD <= addr
// Requires: addr < STACK_MAX_HEIGHT;
// Return the byte at addr
byte_type stack_get_byte(address_type addr)
{


}


     HOW TO ADDRESS LOCAL VARIABLES?

proc p0() {
   var int x;
   fun p1() {
     var int y;
     var int z;
     proc p2() {
        var int a;

        return a+x*y+z;
     }
    /* ... */ p2();
    /* ... */ p0(); /* ... */
   }
   /* ... */ p1();
   /*... */  p0(); /* ... */
}
     

       THE PROBLEM

Programming language features
    - subroutines
    - nesting of subroutines (Pascal, JS)
    - static scoping

  ==> absolute address of variables
      is hard to predict


       COMPILER-BASED SOLUTION

What can compiler know statically
     about local variable locations?


What would we need to find exact location
     of a local variable?


When can an AR be created that needs to
     know the base of the surrounding AR?


Could we pass the base of the AR for the
    surrounding scope in a call?


If each AR stores a (static) link to the
   AR of the surrounding scope,
   how can we address two layers out? 3?


What information is needed to address
  a local variable in a surrounding scope?


          SUMMARY

Compilers use two-part addresses,
     called *lexical addresses*
     that consist of:
   

     HOW TO ADDRESS LOCAL VARIABLES?

procedure p0;
  var x;
  procedure f1;
    var y;
    var z;
    procedure p2;
      var a;
      procedure f3;
        begin
          # here --v
          call f1;
          call f3; 
          x := a+(x*y)+z
        end;
      call f3;
    call p2;
  call f1;
call p0.
     

       INFORMATION STORED IN ARs

What information is needed in an AR?


         WORKING WITH ARs

// Requires: the stack has enough room
//            to allocate a new AR
void stack_call_prep(
       address_type static_link)
{


}

// Requires: restored fp and sp values
//              satisfy stack's invariant
// return given value from a subroutine
extern void stack_restore_for_return()
{


}

       PICTURE OF CALL OPERATION

Execution of stack_call(PC, SL):

 PC: 16   fP: 464   sp: 448

 468 |                 |
 464 |                 | <- fp
 460 |                 |
 456 |                 |
 452 |                 |
 448 |      arg5       | <- sp
 444 |                 |
 440 |                 |
 436 |                 |
 432 |                 |
 428 |                 |
 424 |                 |
 420 |                 |
 416 |                 |
 414 |                 |


       PICTURE OF RETURN OPERATION

Execution of stack_return_value(&pc, v):

 PC: 200   fp: 444      sp: 386

 468 |                 |
 464 |                 |
 460 |                 |
 456 |                 |
 452 |                 |
 448 |      arg5       |
 444 |    saved a0     | <- fp
 440 |    saved a1     |
 436 |    saved a2     |
 432 |    saved a3     |
 428 |    saved s0     |
 424 |    saved s1     |
 420 |    saved s2     |
 416 |    saved s3     |
 414 |    saved s4     |
 410 |    saved s5     |
 406 |    saved s6     |
 402 |    saved s7     |
 398 |    saved s7     |
 394 | old static link |
 390 |    old fp       |
 386 |    old ra       | <- sp
 382 |                 |
     |                 |


         WHAT STATIC LINK TO PASS?

If routine R calls E,
   what static link is passed?