TESTING FOR HOMEWORK 4 Does the generated code need to work in some particular way? No, it's only the output of the compiled code, when run (on the VM), that has to match the expected outputs RUNNING INDIVIDUAL TESTS $ cat simple-print.spl % $Id: simple-print.spl,v 1.1 ... begin print 3402 end. $ make simple-print.myo rm -f simple-print.bof; umask 022; \ ./compiler simple-print.spl rm -f simple-print.myo; umask 022; \ cat char-inputs.txt | \ ./vm/vm simple-print.bof \ > simple-print.myo 2>&1 $ cat simple-print.myo 3402$ THE TASK Write modules: gen_code literal_table So that with the provided files the compiler transforms .spl source file into .bof file that when run in the provided VM produces the expected output SPL source file test.spl | | compiler | v BOF file test.bof | | VM (provided) | v program output test.myo OVERALL STRATEGY FOR WRITING GEN_CODE Use the shape of the unparser.c code because that also walks over the ASTs Decide on overall tactics (e.g., expressions put result on stack) and follow those in all cases Build up code from simple test cases (base cases of recursion) so you can test along the way Check your work by: checking for expected: generated code files (make .asm files) traces from the VM (make .myt files) LITERALS STORED ABOVE THE GP FROM BOF Literal values stored at positive offsets from the GP - loaded from the BOF file's data section - offsets known to the compiler from the literal table Memory of the VM running a program: address 32768 [ ] [ ] [ ] [ ] initial [ ] SP,FP -->[ runtime stack ] [ program's AR ] [ AR for block 1 ] [ ... ] FP --> [ AR for block N ] SP --> [ ... ] [ | ] [ v ] [ ] [ ] [ ] [ ] [ ] [ literals ] [ ... ] [ (data from BOF) ] GP --> [ offset 0 data word ] [ ] [ ] [ ] [ ] [ ] [ VM ] [ instructions ] [ ... ] [ (text of BOF) ] [ ] PC --> [ ] 0 [________________________] OTHER BASIC DECISIONS Epressions: push value on "top" of the runtime stack (at offset 0 from SP) Conditions: push the truth value (0 for false and 1 for true) on top of the runtime stack Statements: leave the SP unchanged when done (ultimately they don't allocate or deallocate storage, not permenantly) Declarations: allocate and initialize variables and constants on the runtime stack in the AR for their block (use positive offsets from the FP) STACK LAYOUT FOR EACH AR Layout 2: offset [ ... ] [ local variables ] [ ... ] FP -->[ local constants ] 0 [ saved SP ]-1 [ registers FP ]-2 [ static link ]-3 [ RA ]-4 [ temporary storage ] SP -->[ ... ] TACTIC: MAKE EVERYTHING LOOK THE SAME Even the program's AR looks like other ARs $ cat empty.spl % $Id: empty.spl,v 1.1 ... begin end. $ $ make empty.asm rm -f empty.asm; umask 022; \ vm/disasm empty.bof > empty.asm 2>&1 $ cat empty.asm .text 0 a0: CPR $r3, $fp a1: SWR $sp, -1, $sp a2: SWR $sp, -2, $fp a3: SWR $sp, -3, $r3 a4: SWR $sp, -4, $ra a5: CPR $fp, $sp a6: SRI $sp, 4 a7: SWR $sp, -1, $sp a8: SWR $sp, -2, $fp a9: SWR $sp, -3, $r3 a10: SWR $sp, -4, $ra a11: CPR $fp, $sp a12: SRI $sp, 4 a13: NOP a14: LWR $ra, $fp, -4 a15: LWR $r3, $fp, -1 a16: LWR $fp, $fp, -2 a17: CPR $sp, $r3 a18: LWR $ra, $fp, -4 a19: LWR $r3, $fp, -1 a20: LWR $fp, $fp, -2 a21: CPR $sp, $r3 a22: EXIT 0 .data 4096 .stack 20480 .end $ SUPPORTING PROCEDURES AND CALLS Main issues: - storing their code Why? because only execute procedures when they are called - knowing exactly where each starts Why? the CALL instruction needs an address Another issue: - sending the right static link so that the procedure's body can find constants and variables using static scoping WHERE TO PUT PROCEDURE CODE? Possible layouts in VM's code array: 1. store the main program first, then proc code [code to set up program's AR] [code for the main program's block] [(optional: code to tear down the block] EXIT 0 [code for each procedure...] 2. store the proc code first, then main program JMPA size(proc code)+1 [code for each procedure ...] [code to set up program's AR] [code for the main program's block] [(optional: code to tear down the block] EXIT 0 3. store proc code under programmer control (not in SPL) NESTED PROCEDURES ARE A PROBLEM begin proc A begin proc B begin # B's body code... call A # ... # ... end; # A's body code call B # ... # ... end; call A end. If lay out the code as [ code for A ] [ code for B ] How do we know the address of B to compile the call to B in A? We don't What about the other direction? We don't RECURSIVE PROCEDURES, SIMILAR PROBLEM begin proc R begin # R's body code ... call R # ... end; # ... call R # ... end. Before storing code for R, how do we know where it starts? MUTUAL RECURSION (NOT IN OUR LANGUAGE) begin proc O begin # O's body code... call E # ... end; proc E; begin # E's body code ... call O # ... end; # ... call O; call E # ... end. One of these must before the other in the code area of the VM... SOLUTION STRATEGIES FOR CALLS [Multiple passes]: 1. Generate code for each procedure (+ store offsets in symbol table, + layout procedure code in memory with placholders for calls) 2. Gather table of addresses (map from names to addresses, using offsets and beginning address) 3. Patch up code addresses for calls (+ output code) [Lazy evaluation, labels]: 1. Generate code for each procedure with calls to "labels" (+ store or update labels in symbol table) (+ output code) GENERAL SOLUTION: MULTIPLE PASSES Problem: where does each procedure start? Passes over the IR: 1. Compile all procedure code (now know how big each procedure is) 2. Lay out procedure code in memory (now know where each starts) 3. Change each call instruction GENERAL SOLUTION: LABELS Use "labels" to allow Term "label" is from assembly language ; ... jmp L ; ... L: ; ... APPROACHES TO FIXING LABELS Problem: convert labels to addresses (1) Use multiple passes a. Generate code with labels b. Lay out memory for procedures (determine starting addresses) c. Change labels to addresses advantages: disadvantages: (2) Use shared mutable data (lazy eval.) a. labels are unique placeholders, shared by all uses (calls) b. when address is determined, update the placeholder (and all uses are updated) advantages: disadvantages: LABEL DATA STRUCTURE // file label.h // ... #include "machine_types.h" typedef struct { bool is_set; unsigned int word_offset; } label; // Return a fresh label that is not set extern label *label_create(); // Requires: lab != NULL // Set the address in the label extern void label_set(label *lab, unsigned int word_offset); // Is the given label set? extern bool label_is_set(label *lab); // Requires: label_is_set(lab) // Return the word offset in lab extern unsigned int label_read(label *lab); CREATING LABELS Labels created for all procedures when creating proc_decl_t ASTs. // file ast.h // ... #include "label.h" // ... typedef struct proc_decl_s { file_location *file_loc; AST_type type_tag; struct proc_decl_s *next; // for lists const char *name; struct block_s *block; label *lab; // for code generation } proc_decl_t; // file ast.c // Return an AST for a proc_decl proc_decl_t ast_proc_decl(ident_t ident, block_t block) { proc_decl_t ret; ret.file_loc = // ... ret.type_tag = proc_decl_ast; ret.next = NULL; ret.name = ident.name; block_t *p = // ... ret.block = p; // this is the source of the labels ret.lab = label_create(); assert(ret.lab != NULL); return ret; } PROPAGATING POINTERS TO LABELS (1) Labels added to attributes of procedure names // file id_attrs.h //... #include "label.h" typedef struct { file_location file_loc; id_kind kind; // kind of identifier unsigned int offset_count; // for a procedure, its label label *lab; } id_attrs; PROPAGATING POINTERS TO LABELS (2) Make call statement ASTs point to the label of the procedure being called // file scope_check.c // check the statement to make sure that // the procedure has been declared // (if not, then produce an error). // Modifies the given AST // to have appropriate id_use pointers. void scope_check_callStmt( call_stmt_t *stmt) { } PROPAGATING POINTERS TO LABELS (3) Associate labels with each call instruction in code structures // file code.h // ... #include "label.h" typedef struct code_s { struct code_s *next; bin_instr_t instr; // labels for call instructions label *lab; } code; // ... // Requires: lab != NULL // Create and return a fresh instruction // with the named mnemonic and parameters extern code *code_call(address_type a, label *lab); // ... SETTING LABELS IN PROCEDURES (1) // file label.h typedef struct { bool is_set; unsigned int word_offset; } label; SETTING LABELS IN PROCEDURES (2) // file gen_code.c // ... void gen_code_proc_decl(proc_decl_t pd) { } PUTTING ADDRESSES IN CALL INSTRUCTIONS Write a function to fix all call instructions in a code_seq extern void code_seq_fix_labels(code_seq cs);