GOALS FOR SUBROUTINES subroutine = function or procedure (abstraction of expressions or commands) Want: WHAT A SUBROUTINE CALL DOES What are the steps in a subroutine call like: x = f(E1,E2) where E1 and E2 are expressions? SRM = SIMPLIFIED RISC MACHINE RISC = Reduced Instruction Set Computer (vs. CISC = Complex Instruction Set ...) SRM, for HW, based on MIPS processor Register-based instructions: ADD s,t,d is GPR[d] <- GPR[s] + GPR[t] ADDI s,t,i is GPR[t] <- GPR[s] + i Byte-addressible: LBU b,t,o is GPR[t] <- memory[GPR[b]+o] 32 REGISTERS IN THE SRM Num Notes 0 always 0 (can't write this!) 1 assembler's temporary 2 function result ($v0) 3 function result ($v1) 4 function argument ($a0) 5 function argument ($a1) 6 function argument ($a2) 7 function argument ($a3) 8-15 temporary ($t0,...,$t7) 16-23 temporary ($s0,...) 24-25 temporary ($t8, $t9) 26 (reserved for OS) 27 (reserved for OS) 28 global, static data ($gp) 29 stack pointer ($sp) 30 frame pointer ($fp) 31 return address ($ra) CALLS AND REGISTERS Call done by "Jump and Link" instruction LW $a0, x # load argument x LW $a1, y # load argument y JAL f $ra <- PC # save return addr PC <- f # jump to f ... JR $ra # jump to return addr WHO SAVES AND RESTORES REGISTERS? Problem: limited number of registers - caller might want some preserved - callee might change some So if registers are saved by: Callee, then might need to save Caller, then might need to save So, use a CALLING CONVENTION Agreement between all callers and callees Callee saves & restores: Caller saves & restores: IT'S A CONVENTION / AGREEMENT Saving and restoring not enforced VM FEATURES TO SUPPORT SUBROUTINES STATIC SCOPING def: In *static scoping*, each identifier x def: In *dynamic scoping*, each identifier x MOTIVATION FOR STATIC SCOPING int incr = 1; int addOne(int y) { return y+incr; } int client() { int incr = 2; int z = addOne(3); // what is the value of z here? return z; } BLOCK STRUCTURE def: A *block* is Usual Grammar: Example in C { int next = 3*x+1; next = next / 2; return next; } ADVANTAGES OF BLOCK STRUCTURE - Local storage - Control of names - Easier to extract procedures RECURSIVE DATA ==> RECURSIVE PROGRAMS A good rule of design: // file btree.h #include "Tdef.h" typedef struct treeNode { T value; struct treeNode *left, *right; } tree; // helper to compute maximum of its args int max(int a, int b) { return (a >= b) ? a : b; } RECURSIVE GRAMMARS Grammar for statements: ::= ... | while () Structure of a parser: // typedef /* ... */ stmtTree; stmtTree *parseStatement() { /* ... */ parseWhileLoop(); /* ... */ } stmtTree *parseWhileLoop() { /* ... */ parseStatement(); /* ... */ } VM DESIGN FOR SUBROUTINES For each call: - storage for a subroutine's variables (local storage) organized as - storage for a single call is called an AR: def: An *activation record* (AR) STACK ORGANIZATION In the code: P calls Q, Q calls R Initially: [ 0 ] Call of subroutine P: [ AR for P ] After P calls Q: [ AR for P ] [ AR for Q ] After Q calls R: [ AR for P ] [ AR for Q ] [ AR for R ] After R returns: [ AR for P ] [ AR for Q ] After Q returns: [ AR for P ] After P returns: STACK IMPLEMENTATION AR delimited by two indexes: - fp: - sp: Notes, assuming stack is word-addressed, and grows towards lower addresses (down) STACK IMPLEMENTATION: HEADER FILES Assume: - stack is byte addressed - stack grows down towards lower addresses // File: machine_types.h #ifndef _MACHINE_TYPES_H #define _MACHINE_TYPES_H // type of addresses typedef unsigned int address_type; // type of machine bytes typedef unsigned char byte_type; // type of machine words typedef int word_type; #define BYTES_PER_WORD 4 #endif // File: stack.h #ifndef _STACK_H #define _STACK_H #include #include #include "machine_types.h" // The MAX_STACK_HEIGHT must be // evenly divisible by BYTES_PER_WORD #define MAX_STACK_HEIGHT 2048 // Initialize the stack data structure extern void stack_initialize(); /* ... other extern declarations ... */ #endif STACK IMPLEMENTATION: STACK.C FILE /* $Id: stack.c,v 1.3 2023/09/08 ... */ #include #include #include #include "utilities.h" #include "stack.h" // size of the stack in words #define STACK_LEN (MAX_STACK_HEIGHT / BYTES_PER_WORD) // maximum index of a byte in the stack. // Note: this equals BYTES_PER_WORD * (STACK_LEN - 1) #define MAX_BYTE_INDEX (MAX_STACK_HEIGHT - BYTES_PER_WORD) // the stack's storage static word_type memory[STACK_LEN]; // first index of current AR, in bytes static int fp; // index of top element in current AR, in bytes static int sp; // the stack's invariant void stack_okay() { } // Initialize the stack data structure void stack_initialize() { } // Return the stack's total size, in bytes int stack_size() { } // Return the current AR's num. of bytes int stack_AR_size() { return fp - sp; } // Return the address of the base // of the current AR (fp value) address_type stack_AR_base() { } // Return the address of the top word // element in the current AR (sp value), // as a byte address address_type stack_top_AR_address() { } // Return the address of the top word // element in the current AR (sp value), // as a word address address_type stack_top_AR_word_addr() { } // Is the stack empty? bool stack_empty() { } // Is the stack full? bool stack_full() { } // Requires: BYTES_PER_WORD > j >= 0; // get the jth byte of the word v // (numbered from the right) static byte_type fetchByteFromWord( word_type v, int j) { return (v >> (j*8)) & 0x000000FF; } // Requires: !stack_full() // push a word on the stack // with sp becoming old(sp) - BYTES_PER_WORD void stack_push_word(word_type val) { } // Requires: n is evenly divisible // by BYTES_PER_WORD // Requires: (stack_size() + n) // < MAX_STACK_HEIGHT // Increase the size of the stack by n void stack_allocate_bytes(unsigned int n) { } // Requires: !stack_empty() // pop the stack and return the top word. // The size of the stack is // reduced by BYTES_PER_WORD. word_type stack_pop_word() { } // Requires: n is evenly divisible // by BYTES_PER_WORD // Requires: (stack_size() - n) >= 0 // Decrease the size of the stack by n bytes void stack_deallocate_bytes(unsigned int n) { } // Requires: !stack_empty() // pop the stack and return the top word. // The size of the stack is // reduced by BYTES_PER_WORD. word_type stack_pop_word() { } // Requires: !stack_empty() // return the top word without popping word_type stack_top_word() { } // translate a byte address to // a word address (put in word_addr) // and a byte offset (put in byte_offset) void stack_byte2word_address( int addr, int *word_addr, int *byte_offset) { } // Requires: BYTES_PER_WORD > j >= 0; // set the jth byte of the word v (numbered from the right) // to the given value bv static word_type setByteInWord(word_type x, int j, byte_type bv) { } // Requires: stack_top_AR_word_addr() // <= word_addr // Requires: word_addr < STACK_LEN // Requires: 0 <= byte_offset // Requires: byte_offset < BYTES_PER_WORD // set the byte_offset'th byte of // the stack's storage at word_addr to bv void stack_set_byte_at_word_offset( int word_addr, int byte_offset, byte_type bv) { } // Requires: stack_top_AR_address() // - BYTES_PER_WORD <= addr // Requires: addr < STACK_MAX_HEIGHT; // Set the byte at addr to bv void stack_set_byte(address_type addr, byte_type bv) { } // Requires: 0 <= word_addr // Requires: word_addr < STACK_LEN // Requires: 0 <= byte_offset // Requires: byte_offset < BYTES_PER_WORD // Return the byte_offset'th byte of // the stack's storage at word_addr byte_type stack_get_byte_at_word_offset( int word_addr, int byte_offset) { } // Requires: sp - BYTES_PER_WORD <= addr // Requires: addr < STACK_MAX_HEIGHT; // Return the byte at addr byte_type stack_get_byte(address_type addr) { } HOW TO ADDRESS LOCAL VARIABLES? proc p0() { var int x; fun p1() { var int y; var int z; proc p2() { var int a; return a+x*y+z; } /* ... */ p2(); /* ... */ p0(); /* ... */ } /* ... */ p1(); /*... */ p0(); /* ... */ } THE PROBLEM Programming language features - subroutines - nesting of subroutines (Pascal, JS) - static scoping ==> absolute address of variables is hard to predict COMPILER-BASED SOLUTION What can compiler know statically about local variable locations? - offset from base of AR (known statically) - base of that AR is (the fp of the AR) What would we need to find exact location of a local variable? offset and how to find the AR's base When can an AR be created that needs to know the base of the surrounding AR? when a function calls a nested function (i.e., one declared inside it) like fun p1() { var int y; var int z; fun p2() { /* ... */ y+z /* ... */ } p2(); // call of p2 } [ AR for p0 ] ... [ AR for p1 <- fp for p1's AR y z ] [ AR for p2 ] Could we pass the base of the AR for the surrounding scope in a call? yes! Called the "static link" If each AR stores a (static) link to the AR of the surrounding scope, how can we address two layers out? 3? follow 2 static links - the one in the current AR and then the one found in the surrounding scope's AR follow 3 static links What information is needed to address a local variable in a surrounding scope? the number of levels of static links to follow and the offset SUMMARY Compilers use two-part addresses, called *lexical addresses* that consist of: 1. number of levels of surrounding scopes where the variable was declared 2. The offset from that scope's base HOW TO ADDRESS LOCAL VARIABLES? procedure p0; var x; procedure f1; var y; var z; procedure p2; var a; procedure f3; begin # here --v lexical address of x is (3,0) # lexical address of z is (2,1) # lexical address of a is (1,0) call f1; call f3; x := a+(x*y)+z end; call f3; call p2; call f1; call p0. INFORMATION STORED IN ARs What information is needed in an AR? - any registers that the routine needs to save - old PC value (return address) - old sp value - old fp value (dynamic link) - static link (base of surround scope's AR) WORKING WITH ARs // Requires: the stack has enough room // to allocate a new AR void stack_call_prep( address_type static_link) { // assume parameters are in registers and on stack int old_fp = fp; int old_sp = sp; int a0=4,a1=5,a2=6,a3=7; int s0=16,s1=17,s2=18,s3=16,s4=17; int s5=18,s6=19,s7=20,ra=31; stack_push(GPR[a0]); // save arguments stack_push(GPR[a1]); // save arguments stack_push(GPR[a2]); // save arguments stack_push(GPR[a3]); // save arguments stack_push(GPR[s0]); // save s regs stack_push(GPR[s1]); // save s regs stack_push(GPR[s2]); // save s regs stack_push(GPR[s3]); // save s regs stack_push(GPR[s4]); // save s regs stack_push(GPR[s5]); // save s regs stack_push(GPR[s6]); // save s regs stack_push(GPR[s7]); // save s regs stack_push(static_link); stack_push(old_fp); // dynamic link stack_push(ra); // return address // base of this new AR fp = old_sp + BYTES_PER_WORD; // sp already adjusted by the pushes // == old_sp + 15 * BYTES_PER_WORD } // Requires: restored fp and sp values // satisfy stack's invariant // return given value from a subroutine extern void stack_restore_for_return() { // assume result is in register v0 int s0=16,s1=17,s2=18,s3=16,s4=17; int s5=18,s6=19,s7=20,ra=31; // restore return address GPR[ra] = stack_pop(); // restore the fp fp = stack_pop(); stack_pop(); // toss static link // restore the s registers GPR[s7] = stack_pop(); GPR[s6] = stack_pop(); GPR[s5] = stack_pop(); GPR[s4] = stack_pop(); GPR[s3] = stack_pop(); GPR[s2] = stack_pop(); GPR[s1] = stack_pop(); GPR[s0] = stack_pop(); do_jr(ra); // sets PC to contents of ra register } PICTURE OF CALL OPERATION Execution of stack_call(SL): PC: 16 fP: 464 sp: 448 468 | | 464 | | <- fp 460 | | 456 | | 452 | | 448 | arg5 | <- sp 444 | | 440 | | 436 | | 432 | | 428 | | 424 | | 420 | | 416 | | 412 | | 408 | | 404 | | 400 | | 396 | | 392 | | 388 | | 384 | | <- sp PICTURE OF RETURN OPERATION Execution of stack_return_value(&pc, v): PC: 200 fp: 444 sp: 384 468 | | 464 | | 460 | | 456 | | 452 | | 448 | arg5 | 444 | saved a0 | <- fp 440 | saved a1 | 436 | saved a2 | 432 | saved a3 | 428 | saved s0 | 424 | saved s1 | 420 | saved s2 | 416 | saved s3 | 412 | saved s4 | 408 | saved s5 | 404 | saved s6 | 400 | saved s7 | 396 | saved s7 | 392 | old static link | 388 | old fp | 382 | old ra | <- sp 382 | | | | WHAT STATIC LINK TO PASS? If routine R calls E, what static link is passed? If E is L layers outwards from where R is declared if L == 0 // in the same scope as R then pass teh current AR's base address if L > 0 // in a surrounding scope the base address of the AR L levels out (i.e., follow L static links) example: procedure p0(); var int y; procedure p1(); var int x; procedure p2(); procedure p3(); begin x := x+3 end; begin # body of p2 call p2(); # L is 1, pass p2's static link # i.e., base of p1's AR x := x+1; call p3(); # L is 0, pass the p2's AR base # i.e., the fp call p1() # L is 2, pass p0's AR base end; begin # body of p1 y := y - 1; call p2() end; begin call p1() end VM DESIGNS PREVIOUSLY SEEN 1-register machine == Tiny VM SRM (3 registers) == VM of Homework 1 SIMPLIFIED RISC MACHINE (SRM) 32 registers Num. Usage Name(s) ===================================== 0 always 0 $0 1 assembler temp. $at 2,3 function results $v0, $v1 4-7 function args $a0 - $a3 8-15 temporaries $t0 - $t7 16-23 saved temporaries $s0 - $s7 24,25 temporaries $t8, $t9 26,27 reserved for OS 28 globals pointer $gp 29 stack pointer $sp 30 frame pointer $fp 31 return address $ra INSTRUCTIONS All fit in one word (32 bits) Register format: op:6 rs:5 rt:5, rd:5, shift:5 func:6 Immediate format: op:6 rs:5 rt:5 immed:16 Jump type: op:6 addr:26 Register Format Arithmetic and Logic (3 registers): 3 registers: ADD, SUB AND, BOR, NOR, XOR SUB s, t, d GPR[d] <- GPR[s] - GPR[t] 2 registers: MUL, DIV 1 register: MFHI, MFLO Shifts (2 registers + shift): SLL, SRL SLL s, t, amt GPR[t] <- GPR[s] << amt Jump to register (1 register): JR JR s PC <- GPR[s] Immediate Format Arithmetic (2 registers & immediate): ADDI Logical (2 registers & immediate): ANDI, BOI, XORI Comparison & jump (2 registers & offset): BEQ, BNE Comparisons to 0 (1 register & offset): BGEZ, BGTZ, BLEZ, BLTZ Load, Store name 1 register: Call, Return use static links P-MACHINE INSTRUCTIONS Word length == 16 bits Memory is word addressed Instruction Format: OP code: 4 bits Registers: 2 bits Memory location: 8 bits Code location: 12 bits Examples: 1 register, 1 address/value: LIT Rd, 0, M [ Op | Rd | - | M ] <-4--> 2 2 <--8 bits-----> LOD Rd, 0, M 3 registers: SUB Rd, Ry, Rz [ Op | Rd | Ry | Rz | func ] <-4--> 2 2 2 <-6 bits-> System operations, I/O: SYS 0, 0, 0 [ Op | - | - | - | Dv ] <-4--> 2 2 <-6 bits-> 2 Jumps: JMP 0,0,M [ Op | M ] <-4--> <-- 12 bits -----------> ADAPTING TO CHANGES Larger word size of 32 bits? Byte addressing? with 16 bit words: with 32 bit words: