COP 3402 meeting -*- Outline -*-

* Supporting Subroutines in the ISA

   Q: What is a subroutine?
      It's a rule, abstract algorithm
      
      It's code that can be parameterized and used over and over
          - so it can be used to avoid repeating code
      In C it's a function, in Java/C# a method, ...

      It's an abstraction of an expression (computes a value)
      or a command (changes the program's state)
      in a way that depends on its parameters

      It's a key modularity feature in programming
         - clients can call it,
         - implementation may change code as long as it does what it should

** feature design to support subroutines
   The goal is to explain the features of an ISA that are needed to:
        1. make writing subroutines easier, and
        2. have them execute faster

------------------------------------------
       GOALS FOR SUBROUTINES

subroutine = function or procedure
             (abstraction of expressions
             or commands)

Want:


------------------------------------------
      ...
      - independent development
      - information hiding (decisions are local and independent)
      - maximal reuse (caller need not know details of decisions)
      - efficient execution

------------------------------------------
    WHAT A SUBROUTINE CALL DOES

What are the steps in a subroutine call
    like:   x = f(E1,E2)
    where E1 and E2 are expressions?


------------------------------------------
    This is with "call by value", which is used in C and in all OO languages
    and thus most modern programming languages.

    ...
      - caller evaluates argument expressions E1,E2 to get argument values
      -  pass the argument values to the subroutine
      - callee does computation
      -  pass the result, if any, back
      - caller continues right after the call

   Q: What happens in the machine to pass arguments?
       they (their values in call by value) are stored somewhere
       where the callee can find them
       
*** SRM, a simplified RISC machine
------------------------------------------
        SRM = SIMPLIFIED RISC MACHINE

RISC = Reduced Instruction Set Computer
  (vs. CISC = Complex Instruction Set ...)

SRM, for HW, based on MIPS processor

Register-based instructions:

  ADD s,t,d  is  GPR[d] <- GPR[s] + GPR[t]

 ADDI s,t,i  is  GPR[t] <- GPR[s] + i

Byte-addressible:

  LBU b,t,o  is GPR[t] <- memory[GPR[b]+o]

------------------------------------------
    Q: Is the x86 a RISC or CISC design?
       definitely CISC
       
    MIPS = Microprocessor without
          Interlocked Pipelined Stages

    In MIPS (and SRM) each instruction only does a small amount of work
    programs put those together...

------------------------------------------
       32 REGISTERS IN THE SRM

Num Notes
0   always 0 (can't write this!)
1   assembler's temporary
2   function result                ($v0)     
3   function result                ($v1)
4   function argument              ($a0)
5   function argument              ($a1)
6   function argument              ($a2)
7   function argument              ($a3)
8-15 temporary             ($t0,...,$t7)
16-23 temporary                ($s0,...)
24-25 temporary               ($t8, $t9)
26  (reserved for OS)
27  (reserved for OS)
28  global, static data            ($gp)
29  stack pointer                  ($sp)
30  frame pointer                  ($fp)
31  return address                 ($ra)
------------------------------------------
        The return address, stack pointer, and frame pointer
        are involved in subroutines and will be important today...

------------------------------------------
       CALLS AND REGISTERS

Call done by "Jump and Link" instruction

    LW $a0, x      # load argument x
    LW $a1, y      # load argument y

    JAL f
       $ra <- PC   # save return addr
       PC <- f     # jump to f

    ...

    JR $ra         # jump to return addr
    
------------------------------------------
    Note: LW loads a word into a register

    Q: What complications arise for calls in a VM with registers?
       - the registers are shared between the caller and callee,
       - in a recursion, registers are shared among routine activations

------------------------------------------
     WHO SAVES AND RESTORES REGISTERS?

Problem: limited number of registers
      - caller might want some preserved
      - callee might change some

So if registers are saved by:

 Callee, then might need to save


 Caller, then might need to save 


 So, use a


------------------------------------------
     ... more than the caller needs to use after
         (useless work)

     ... more than the callee needs to use during call
         (useless work)


     ... hybrid scheme, where caller saves some registers (if needs them)
                        callee saves others (if it uses them)

------------------------------------------
          CALLING CONVENTION

Agreement between all callers and callees

 Callee saves & restores:


 Caller saves & restores:


------------------------------------------
   ... return address, stack/frame pointers, and some registers ($s0-$s7)

   ... arguments, temporaries needed


   Registers $ra, $sp, $fp and $s0-$s7 are preserved
        (saved by callee if need be, and restored when routine returns)
   The rest are not preserved by callee,
        so if the caller cares, then caller must save them
                                     and restore them after return

------------------------------------------
        IT'S A CONVENTION / AGREEMENT

Saving and restoring not enforced


------------------------------------------
      ... by hardware
          - but it's useful to have everyone know what to do
              Why? Allows modularity/plug compatability


------------------------------------------
    VM FEATURES TO SUPPORT SUBROUTINES


------------------------------------------
     ... (starting with fundamental decisions in the ISA)
         - call and return instructions
               - must save the return address (PC) somewhere
                  (in MIPS and SRM it's saved in reg. 31)
                - arguments, and
                - result(s)
         - support for recursive subroutines
             ==> different calls (activations) need different space
             ==> may coexist at runtime
         - help to manage local storage for subroutines
                - formal parameter(s)
                - local variables (helps subroutine independence and reuse)
         - support for static scoping
                want to understand subroutine where it is written
                   (names refer to surrounding declarations in text
                    not to dynamically recent declarations)
         - way to return long-lived data
             - heap/global storage
         + more modern features (optional):
              + exception handling
                       (instructions for throwing exceptions,
                        which will unwind stack looking for handler)
              + passing and returning subroutines as values
                       (heap allocation of ARs, closures)
                       (doesn't need new capabilities from VM?)
              + objects and inheritance
                       (heap allocation,
                        virtual function tables/call instructions as in JVM)
              + support for multi-threading
                       (multiple runtime stacks and thread switching,
                        coroutines? locking/atomicity?)


** design for statically-scoped subroutines

*** what is static scoping?
------------------------------------------
          STATIC SCOPING

def: In *static scoping*,
     each identifier x


def: In *dynamic scoping*,
     each identifier x


------------------------------------------
     ... denotes the location for x declared by the closest textually
         surrounding declaration of x

         This is also known as *lexical scoping*

         Q: Is there another way that identifiers could be found in a
         program?
         Yes, it's called dynamic scoping, where each identifier x,
     ... denotes the location for x declared by the most recent
         declaration for x that is still active.

     Q: What kind of scoping is in C, C++, and Java?  The Unix shell?
     C, C++, Java all have static scoping for variables and function
     names
     The Unix shell uses dynamic scoping for environment variables!
     Java uses dynamic scoping for exception handlers

*** motivation for static scoping
------------------------------------------
       MOTIVATION FOR STATIC SCOPING

 int incr = 1;
 int addOne(int y) { return y+incr; }

 int client() {
    int incr = 2;
    int z = addOne(3);
    // what is the value of z here?
    return z;
 }

------------------------------------------

     Q: What should addOne do? What should the value of z be?
        does it mean what it meant when it was written
        or when it was called?

     Q: Do we want to be able to check programs when we write them?
     Yes, usually

     This is even more important in modern (functional) languages
     where we want to return functions/closures

*** block structure
------------------------------------------
       BLOCK STRUCTURE


def: A *block* is


Usual Grammar:


Example in C

   {
      int next = 3*x+1;
      next = next / 2;
      return next;
   }
------------------------------------------
        ... a sequence of local declarations
        and statements

        <stmt> ::= ... | <block>
        <block> ::= { <decls> <stmts> }
        <decls> ::= <empty>
                    | <declaration> <decls>
        <declaration> ::= int <name> | ...
        <stmts> ::= <stmt> <stmts>

------------------------------------------
       ADVANTAGES OF BLOCK STRUCTURE

 - Local storage


 - Control of names


 - Easier to extract procedures


------------------------------------------

        ... can declare temporary variables whose space is reclaimed
            when the block is finished

        e.g., { int huge_array[HUGE]; /* ... */ }

        ... can pick the best names for variables
            without worry that they conflict with other code
            (look locally first), so independent development

        ... a block can be more easily seen as a procedure,
            with the body being the block
            (and the free variables as parameters)

        It may help code be easier to read
        than a language without block structure,
        since the declarations are closer to their use

*** motivation for recursion
    Q: Do you think recursion is hard to use and understand?
    (get a show of hands)
    I find it is very useful, especially for compilers,
    but it's also useful for other kinds of programs...

------------------------------------------
   RECURSIVE DATA ==> RECURSIVE PROGRAMS

A good rule of design:


// file btree.h
#include "Tdef.h"

typedef struct treeNode {
    T value;
    struct treeNode *left, *right;
} tree;

// helper to compute maximum of its args
int max(int a, int b) {
    return (a >= b) ? a : b;
}


------------------------------------------
   ...
   organize the program's structure
   like the
   data's structure

   i.e., program ~ data (in terms of structure)

    Q: How should we write a program to find the depth of a tree?
        ...

// Requires: t != NULL
// Return the depth of t
int depth (tree *t) {
    if (t == NULL) {
	return 0;
    } else {
	return 1 + max(
	   depth(t->left),
	   depth(t->right));
    }
}

        Q: Is that better than using while loops and an explicit stack?
        Yes, it's way clearer
        See the example code page for a non-recursive version...
        
------------------------------------------
         RECURSIVE GRAMMARS

Grammar for statements:

<stmt> ::= ...
     | while (<expr>) <stmt>


Structure of a parser:
// typedef /* ... */ stmtTree;

stmtTree *parseStatement() {
     /* ... */
     parseWhileLoop();
     /* ... */
}

stmtTree *parseWhileLoop() {
    /* ... */
    parseStatement();
    /* ... */
}
------------------------------------------
        Q: Is it easy to follow what these routines are doing?
        Yes, it's much harder to implement these without recursion

        Q: Why are natural languages structured recursively?
        Seems to be more powerful, it's how our brains work...

        Other examples:
          - list manipulation (recursively structured data)
          - expression evaluation (their grammar is also recursive)
          - searching directories (which many contain directories)
          - interpreting or displaying web pages (recursive data)

*** design for subroutines
------------------------------------------
        VM DESIGN FOR SUBROUTINES

For each call:
  - storage for a subroutine's variables
    (local storage)
    organized as

  - storage for a single call is called
    an AR:

def: An *activation record* (AR)


------------------------------------------
        ... a stack (called the runtime stack)
        Q: Why use a stack?
        Because if client calls subroutine P,
          then P must return before client can call another subroutine
          thus last-called is first to finish (LIFO)

        ... *activation record* (AR)
        is a portion of the runtime stack
        that holds local storage
        for one call of a subroutine

------------------------------------------
            STACK ORGANIZATION

In the code: P calls Q, Q calls R

Initially:

        [        0 ]

Call of subroutine P:

        [ AR for P ]


After P calls Q:

        [ AR for P ]
        [ AR for Q ]


After Q calls R:

        [ AR for P ]
        [ AR for Q ]
        [ AR for R ]

After R returns:

        [ AR for P ]
        [ AR for Q ]

After Q returns:

        [ AR for P ]

After P returns:

------------------------------------------
        Point out pushes and pops


*** stack implementation
    Q: Since a computer's memory is like a big (1D) array,
       how should we implement the runtime stack?
       tracking indexes of each AR in the array

------------------------------------------
           STACK IMPLEMENTATION

AR delimited by two indexes:

   - fp:


   - sp:


Notes,
   assuming stack is word-addressed,
   and grows towards lower addresses (down)


------------------------------------------
        frame pointer, byte address of the first element in AR
        stack pointer, byte address of the top (last allocated) element in AR

        Notes:
        We are assuming that the stack grows downwards,
                     towards lower addresses
                     and that the stack is byte addressed
        There are other choices that could be made,
              but on Unix stacks grow downward
                  and on the SRM and x86 machines memory is byte-addressed
              On some machines, ARs must be aligned,
                 e.g., on MIPS ARs must start on a double-word boundary
       
        ... $fp >= $sp
        ... AR is stored in memory from $fp to $sp (inclusive)

        Q: Does the storage for a subroutine vary dynamically?
           Yes, supports block structure in programming languages

------------------------------------------
     STACK IMPLEMENTATION: HEADER FILES

Assume:
  - stack is byte addressed
  - stack grows down
    towards lower addresses

// File: machine_types.h
#ifndef _MACHINE_TYPES_H
#define _MACHINE_TYPES_H

// type of addresses
typedef unsigned int address_type;

// type of machine bytes
typedef unsigned char byte_type;

// type of machine words
typedef int word_type;

#define BYTES_PER_WORD 4

#endif

// File: stack.h
#ifndef _STACK_H
#define _STACK_H
#include <stdbool.h>
#include <stdio.h>
#include "machine_types.h"

// The MAX_STACK_HEIGHT must be
// evenly divisible by BYTES_PER_WORD
#define MAX_STACK_HEIGHT 2048

// Initialize the stack data structure
extern void stack_initialize();

/* ... other extern declarations ... */
#endif
------------------------------------------
        Note that words are ints in C
        (we assume 32 bit integers)

------------------------------------------
      STACK IMPLEMENTATION: STACK.C FILE

/* $Id: stack.c,v 1.3 2023/09/08 ... */
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include "utilities.h"
#include "stack.h"

// size of the stack in words
#define STACK_LEN (MAX_STACK_HEIGHT / BYTES_PER_WORD)

// maximum index of a byte in the stack.
// Note: this equals BYTES_PER_WORD * (STACK_LEN - 1)
#define MAX_BYTE_INDEX (MAX_STACK_HEIGHT - BYTES_PER_WORD)

// the stack's storage
static word_type memory[STACK_LEN];

// first index of current AR, in bytes
static int fp;

// index of top element in current AR, in bytes
static int sp;

// the stack's invariant
void stack_okay()
{


}

// Initialize the stack data structure
void stack_initialize()
{


}

// Return the stack's total size, in bytes
int stack_size() {               }

// Return the current AR's num. of bytes
int stack_AR_size() { return fp - sp; }

// Return the address of the base
// of the current AR (fp value)
address_type stack_AR_base() {

}

// Return the address of the top word
// element in the current AR (sp value),
// as a byte address
address_type stack_top_AR_address() {

}

// Return the address of the top word
// element in the current AR (sp value),
// as a word address
address_type stack_top_AR_word_addr() {

}

// Is the stack empty?
bool stack_empty() {

}

// Is the stack full?
bool stack_full() {

}

// Requires: BYTES_PER_WORD > j >= 0;
// get the jth byte of the word v
// (numbered from the right)
static byte_type fetchByteFromWord(
                  word_type v, int j) {
    return (v >> (j*8)) & 0x000000FF;
}

// Requires: !stack_full()
// push a word on the stack
// with sp becoming old(sp) - BYTES_PER_WORD
void stack_push_word(word_type val) {


}

// Requires: n is evenly divisible
//           by BYTES_PER_WORD
// Requires: (stack_size() + n)
//             < MAX_STACK_HEIGHT
// Increase the size of the stack by n
void stack_allocate_bytes(unsigned int n)
{


}


// Requires: !stack_empty()
// pop the stack and return the top word.
// The size of the stack is
// reduced by BYTES_PER_WORD.
word_type stack_pop_word()
{


}

// Requires: n is evenly divisible
//           by BYTES_PER_WORD
// Requires: (stack_size() - n) >= 0
// Decrease the size of the stack by n bytes
void stack_deallocate_bytes(unsigned int n)
{


}

// Requires: !stack_empty()
// pop the stack and return the top word.
// The size of the stack is
// reduced by BYTES_PER_WORD.
word_type stack_pop_word()
{


}

// Requires: !stack_empty()
// return the top word without popping
word_type stack_top_word()
{


}

// translate a byte address to
// a word address (put in word_addr)
// and a byte offset (put in byte_offset)
void stack_byte2word_address(
	      int addr, int *word_addr,
	      int *byte_offset)
{


}

// Requires: BYTES_PER_WORD > j >= 0;
// set the jth byte of the word v (numbered from the right)
// to the given value bv
static word_type setByteInWord(word_type x, int j, byte_type bv) {


}

// Requires: stack_top_AR_word_addr()
//                   <= word_addr
// Requires: word_addr < STACK_LEN
// Requires: 0 <= byte_offset
// Requires: byte_offset < BYTES_PER_WORD
// set the byte_offset'th byte of
// the stack's storage at word_addr to bv
void stack_set_byte_at_word_offset(
	    int word_addr,
	    int byte_offset,
	    byte_type bv)
{


}

// Requires: stack_top_AR_address()
//           - BYTES_PER_WORD <= addr
// Requires: addr < STACK_MAX_HEIGHT;
// Set the byte at addr to bv
void stack_set_byte(address_type addr,
		    byte_type bv)
{


}

// Requires: 0 <= word_addr
// Requires: word_addr < STACK_LEN
// Requires: 0 <= byte_offset
// Requires: byte_offset < BYTES_PER_WORD
// Return the byte_offset'th byte of
// the stack's storage at word_addr
byte_type stack_get_byte_at_word_offset(
	    int word_addr,
	    int byte_offset)
{


}

// Requires: sp - BYTES_PER_WORD <= addr
// Requires: addr < STACK_MAX_HEIGHT;
// Return the byte at addr
byte_type stack_get_byte(address_type addr)
{


}

------------------------------------------
...
    assert(0 <= sp);
    assert(sp <= fp);
    assert(fp <= MAX_BYTE_INDEX);
    assert(fp % BYTES_PER_WORD == 0);
    assert(sp % BYTES_PER_WORD == 0);

    ...
    fp = MAX_BYTE_INDEX;
    sp = fp;
    stack_okay();
    int wdi; // word index for the loop
    for (wdi = fp/BYTES_PER_WORD;
	 0 <= wdi;
	 wdi--) {
	memory[wdi] = (word_type) 0;
    }

    ...
    return MAX_BYTE_INDEX - sp;

    ... return fp - sp; 

    ...
    return fp;
    
    ... return stack_size() <= 0;
    ...
    return stack_size() >= MAX_BYTE_INDEX;

    ...
    stack_okay();
    assert(n % BYTES_PER_WORD == 0);
    if ((stack_size() + n) >= MAX_STACK_HEIGHT) {
	bail_with_error("Can't allocate %u bytes", n);
    }
    sp = sp - n;
    stack_okay();

    ...
    stack_okay();
    stack_allocate_bytes(BYTES_PER_WORD);
    memory[sp/BYTES_PER_WORD] = val;
    stack_okay();

    ...
    stack_okay();
    assert(n % BYTES_PER_WORD == 0);
    if (sp + n >= MAX_STACK_HEIGHT) {
	bail_with_error("Can't deallocate %u bytes", n);
    }
    sp = sp + n;
    stack_okay();

    ...
    stack_okay();
    assert(n % BYTES_PER_WORD == 0);
    if (!(sp + n <= fp)) {
	bail_with_error("Can't deallocate %u bytes", n);
    }
    sp = sp + n;
    stack_okay();

    ...
    stack_okay();
    word_type ret;
    if (stack_empty()) {
	bail_with_error("Can't pop word off empty stack!");
    } else {
	ret = memory[sp/BYTES_PER_WORD];
    }
    stack_deallocate_bytes(BYTES_PER_WORD);
    stack_okay();
    return ret;

    ...
    stack_okay();
    word_type ret;
    if (stack_empty()) {
	bail_with_error("Can't get top word of empty stack!");
    } else {
	ret = memory[sp/BYTES_PER_WORD];
    }
    // stack_okay(); // no need to check, as nothing changed
    return ret;

    ...
    *word_addr = addr / BYTES_PER_WORD;
    *byte_offset = addr % BYTES_PER_WORD;
    assert(addr == (*word_addr) * BYTES_PER_WORD + (*byte_offset));

    ...
    // clear the bits to be set, preserve the others
    unsigned int ret = (unsigned int) x & (~0xFFU << j*8);
    ret |= ((bv & 0xFFU) << j*8);
    return ret;

    ...
    stack_okay();
    assert(0 <= word_addr && word_addr < STACK_LEN);
    assert(0 <= byte_offset && byte_offset < BYTES_PER_WORD);
    memory[word_addr] = setByteInWord(memory[word_addr], byte_offset, bv);

    ...
    int wa;
    int bo;
    stack_byte2word_address(addr, &wa,
			    &bo);
    stack_set_byte_at_word_offset(wa,
				  bo,
				  bv);


    ...
    stack_okay();
    assert(0 <= word_addr && word_addr < STACK_LEN);
    assert(0 <= byte_offset && byte_offset < BYTES_PER_WORD);
    return fetchByteFromWord(memory[word_addr], byte_offset);

    ...
    int wa;
    int bo;
    stack_byte2word_address(addr, &wa,
			    &bo);
    return 
	stack_get_byte_at_word_offset(wa,
				      bo);


** Addressing for nested routines
*** Problem of addressing locals
------------------------------------------
     HOW TO ADDRESS LOCAL VARIABLES?

procedure p0();
   var int x;
   procedure p1();
     var int y;
     var int z;
     procedure p2();
        var int a;
       begin
         a := a+x*y+z;
       end;
     begin # body of p1
       # ...
       call p2;
       # ...
       call p0
       # ...
     end;
   begin # body of p0
     # ...
     call p1;
     # ...
     call p0
     # ...
   end.
------------------------------------------
        Q: How many calls to p0 and p1 will be on the stack
           when the return from p2 is evaluated?
        We can't tell...

        Q: At the return, how can the compiler find
           the locations of x and y?
        This is the problem...

        For a less artificial example, see quicksort in Pascal:
        http://sandbox.mc.edu/~bennet/cs404/doc/qsort_pas.html

------------------------------------------
       THE PROBLEM

Programming language features
    - subroutines
    - nesting of subroutines (Pascal, JS)
    - static scoping

  ==> absolute address of variables
      is hard to predict

------------------------------------------
        Q: Can we tell from the text of a program where
           a routine's ARs will be on the runtime stack?
        No, if new routines are added,
            then any calculations will be off...
        Q: If we don't know where the AR will be
           on the runtime stack
           how can local variables in an AR be addressed?
        Dynamic prediction of where the AR will be is hard
           and would not be modular,
           so we don't address local variables using absolute
           addresses.
        However, we can know where a variable
        will be within an AR
           (its offset) as that is determined by the text of the
           program (statically)
        So, we can use an offset from the latest AR
        of the surrounding scope, if we can find that.
        
*** Compiler response to solve the problem
------------------------------------------
       COMPILER-BASED SOLUTION

What can compiler know statically
     about local variable locations?


What would we need to find exact location
     of a local variable?


When can an AR be created that needs to
     know the base of the surrounding AR?


Could we pass the base of the AR for the
    surrounding scope in a call?


If each AR stores a (static) link to the
   AR of the surrounding scope,
   how can we address two layers out? 3?


What information is needed to address
  a local variable in a surrounding scope?


------------------------------------------
      ... offset from base of AR,
          since can count allocations from start of routine

      ... the base for the offset


      ... when the surrounding scope calls down
          to a routine nested within that scope
          e.g.,
              fun p1() {
                var int y;
                var int z;
                 proc p2() {
                    /* ... */
                 }
                 p1();
              }

      Q: Are all calls like that?
         Depends on the language:
           can call another routine in same
           or surrounding scope

      ... Yes, this is the *static link* needed
          to address locals in a surrounding scope

      So the solution is to pass the address of the
          base of the AR
          of the surrounding scope when calling a routine,
          so it can address those variables
          This is the "static link"

      ... follow the static link to the surrounding scope's AR,
          then follow that scope's link to its
          surrounding scope's AR...

      ... the number of levels, and
          the offset
      
**** Summary: two-part addresses
------------------------------------------
          SUMMARY

Compilers use two-part addresses,
     called *lexical addresses*
     that consist of:
   

------------------------------------------

        1. Number of levels of surrounding scopes
           to go outwards
            (i.e., number or static links to follow)
        2. Offset from that scope's base

------------------------------------------
     HOW TO ADDRESS LOCAL VARIABLES?

procedure p0;
  var x;
  procedure f1;
    var y;
    var z;
    procedure p2;
      var a;
      procedure f3;
        begin # body of f3
          # here --v
          call f1;
          call f3; 
          x := a+(x*y)+z
        end;
      call f3; # body of p2
    call p2; # body of p1
  call f1; # body of f1
call p0. #body of p0
     
------------------------------------------

        Q: Following the comment,
           what is the lexical address of x?
           (3, 0)
        Q: What is the lexical address of z?
           (2, 1)
        Q: What is the lexical address of a?
           (1, 0)

*** What is stored in an AR
------------------------------------------
       INFORMATION STORED IN ARs

What information is needed in an AR?


------------------------------------------
   ... locations for:
        - any registers need to save
        - old PC value (return address, in $ra)
        - old sp value ($sp)
        - old fp value (dynamic link, $fp)
        - base of AR of surrounding scope (static link)

*** VM stack operations for call/return
------------------------------------------
         WORKING WITH ARs

// Requires: the stack has enough room
//            to allocate a new AR
void stack_call_prep(
       address_type static_link)
{


}

// Requires: restored fp and sp values
//              satisfy stack's invariant
// return given value from a subroutine
extern void stack_restore_for_return()
{


}
------------------------------------------

   ... // stack_call
   // assume parameters are in registers and on stack
   int old_fp = fp;
   int old_sp = sp;

   int a0=4,a1=5,a2=6,a3=7;
   int s0=16,s1=17,s2=18,s3=16,s4=17;
   int s5=18,s6=19,s7=20,ra=31;
   stack_push(GPR[a0]);  // save arguments
   stack_push(GPR[a1]);  // save arguments
   stack_push(GPR[a2]);  // save arguments
   stack_push(GPR[a3]);  // save arguments
   stack_push(GPR[s0]);  // save s regs
   stack_push(GPR[s1]);  // save s regs
   stack_push(GPR[s2]);  // save s regs
   stack_push(GPR[s3]);  // save s regs
   stack_push(GPR[s4]);  // save s regs
   stack_push(GPR[s5]);  // save s regs
   stack_push(GPR[s6]);  // save s regs
   stack_push(GPR[s7]);  // save s regs
   stack_push(static_link);
   stack_push(old_fp);  // dynamic link
   stack_push(ra);   // return address

   // base of this new AR
   fp = old_sp + BYTES_PER_WORD;

   // sp already adjusted by the pushes
   //     == old_sp + 15 * BYTES_PER_WORD

   Q: What is sp now?
    It is old_sp+ 15 * BYTES_PER_WORD

   Q: Does the callee really need to save all of this?
      No, only if it uses a register should it save it and restore it.
      For example, if callee doesn't make calls, no need to save a0-a3 or ra

    ... // stack return
   // assume result is in register v0
   int s0=16,s1=17,s2=18,s3=16,s4=17;
   int s5=18,s6=19,s7=20,ra=31;
   // restore return address
   GPR[ra] = stack_pop();
   // restore the fp
   fp = stack_pop();
   stack_pop(); // toss static link
   // restore the s registers
   GPR[s7] = stack_pop();
   GPR[s6] = stack_pop();
   GPR[s5] = stack_pop();
   GPR[s4] = stack_pop();
   GPR[s3] = stack_pop();
   GPR[s2] = stack_pop();
   GPR[s1] = stack_pop();
   GPR[s0] = stack_pop();
   do_jr(ra); // sets PC to contents of ra register

**** pictures

        For a picture, recall that the stack grows down
        (towards lower addresses) and is byte addressed
        
------------------------------------------
       PICTURE OF CALL OPERATION

Execution of stack_call(SL):

 PC: 16   fP: 464   sp: 448

 468 |                 |
 464 |                 | <- fp
 460 |                 |
 456 |                 |
 452 |                 |
 448 |      arg5       | <- sp
 444 |                 |
 440 |                 |
 436 |                 |
 432 |                 |
 428 |                 |
 424 |                 |
 420 |                 |
 416 |                 |
 412 |                 |
 408 |                 |
 404 |                 |
 400 |                 |
 396 |                 |
 392 |                 |
 388 |                 |

------------------------------------------

        For the return, we want to go back to
        the picture before the call...
------------------------------------------
       PICTURE OF RETURN OPERATION


 PC: 200   fp: 444      sp: 384

 468 |                 |
 464 |                 |
 460 |                 |
 456 |                 |
 452 |                 |
 448 |      arg5       |
 444 |    saved a0     | <- fp
 440 |    saved a1     |
 436 |    saved a2     |
 432 |    saved a3     |
 428 |    saved s0     |
 424 |    saved s1     |
 420 |    saved s2     |
 416 |    saved s3     |
 412 |    saved s4     |
 408 |    saved s5     |
 404 |    saved s6     |
 400 |    saved s7     |
 396 |    saved s7     |
 392 | old static link |
 388 |    old fp       |
 384 |    old ra       | <- sp
 380 |                 |
     |                 |


------------------------------------------


**** determining the static link to pass
------------------------------------------
         WHAT STATIC LINK TO PASS?

If routine R calls E,
   what static link is passed?


------------------------------------------
       ... if E has lexical address of form
          (L,-) then: it is:

            - The current AR's base address, if L == 0
                (this is a call down to a nested procedure)
            - The base address of the AR L levels out, if L > 0
                (this is a call to a surrounding procedure)