I. HW4, Basics and Demos A. Testing and The Task ------------------------------------------ TESTING FOR HOMEWORK 4 Does the generated code need to work in some particular way? ------------------------------------------ ------------------------------------------ RUNNING INDIVIDUAL TESTS $ cat simple-print.spl % $Id: simple-print.spl,v 1.1 ... begin print 3402 end. $ make simple-print.myo rm -f simple-print.bof; umask 022; \ ./compiler simple-print.spl rm -f simple-print.myo; umask 022; \ cat char-inputs.txt | \ ./vm/vm simple-print.bof \ > simple-print.myo 2>&1 $ cat simple-print.myo 3402$ ------------------------------------------ ------------------------------------------ THE TASK Write modules: gen_code literal_table So that with the provided files SPL source file test.spl | | compiler | v BOF file test.bof | | VM (provided) | v program output test.myo ------------------------------------------ B. Strategy ------------------------------------------ OVERALL STRATEGY FOR WRITING GEN_CODE Use the shape of the unparser.c code Decide on overall tactics (e.g., expressions put result on stack) and follow those in all cases Build up code from Check your work by: ------------------------------------------ 1. Some Basic Decisions a. Literal above the GP ------------------------------------------ LITERALS STORED ABOVE THE GP FROM BOF Literal values stored at positive offsets from the GP - loaded from the BOF file's data section - offsets known to the compiler from Memory of the VM running a program: address 32768 [ ] [ ] [ ] [ ] initial [ ] SP,FP -->[ runtime stack ] [ program's AR ] [ AR for block 1 ] [ ... ] FP --> [ AR for block N ] SP --> [ ... ] [ | ] [ v ] [ ] [ ] [ ] [ ] [ ] [ literals ] [ ... ] [ (data from BOF) ] GP --> [ offset 0 data word ] [ ] [ ] [ ] [ ] [ ] [ VM ] [ instructions ] [ ... ] [ (text of BOF) ] [ ] PC --> [ ] 0 [________________________] ------------------------------------------ ------------------------------------------ OTHER BASIC DECISIONS Epressions: Conditions: Statements: Declarations: ------------------------------------------ ------------------------------------------ STACK LAYOUT FOR EACH AR Layout 2: offset [ ... ] [ local variables ] [ ... ] FP -->[ local constants ] 0 [ saved SP ]-1 [ registers FP ]-2 [ static link ]-3 [ RA ]-4 [ temporary storage ] SP -->[ ... ] ------------------------------------------ Is there something in the provided code that makes this layout? C. Tactics ------------------------------------------ TACTIC: MAKE EVERYTHING LOOK THE SAME Even the program's AR looks like other ARs $ cat empty.spl % $Id: empty.spl,v 1.1 ... begin end. $ $ make empty.asm rm -f empty.asm; umask 022; \ vm/disasm empty.bof > empty.asm 2>&1 $ cat empty.asm .text 0 a0: CPR $r3, $fp a1: SWR $sp, -1, $sp a2: SWR $sp, -2, $fp a3: SWR $sp, -3, $r3 a4: SWR $sp, -4, $ra a5: CPR $fp, $sp a6: SRI $sp, 4 a7: SWR $sp, -1, $sp a8: SWR $sp, -2, $fp a9: SWR $sp, -3, $r3 a10: SWR $sp, -4, $ra a11: CPR $fp, $sp a12: SRI $sp, 4 a13: NOP a14: LWR $ra, $fp, -4 a15: LWR $r3, $fp, -1 a16: LWR $fp, $fp, -2 a17: CPR $sp, $r3 a18: LWR $ra, $fp, -4 a19: LWR $r3, $fp, -1 a20: LWR $fp, $fp, -2 a21: CPR $sp, $r3 a22: EXIT 0 .data 4096 .stack 20480 .end $ ------------------------------------------ a. Other simple tests II. Code Generation for Procedures ------------------------------------------ SUPPORTING PROCEDURES AND CALLS Main issues: - storing their code Why? - knowing exactly where each starts Why? Another issue: - sending the right static link ------------------------------------------ What static link does a called procedure need? A. Where to store code for procedures? Where do we put the code sequences for each procedure? ------------------------------------------ WHERE TO PUT PROCEDURE CODE? Possible layouts in VM's code array: ------------------------------------------ How would you implement each? Which layout makes the most sense? B. how to find each procedure's starting address? ------------------------------------------ NESTED PROCEDURES ARE A PROBLEM begin proc A begin proc B begin # B's body code... call A # ... # ... end; # A's body code call B # ... # ... end; call A end. If lay out the code as [ code for A ] [ code for B ] How do we know the address of B to compile the call to B in A? What about the other direction? ------------------------------------------ ------------------------------------------ RECURSIVE PROCEDURES, SIMILAR PROBLEM begin proc R begin # R's body code ... call R # ... end; # ... call R # ... end. Before storing code for R, how do we know where it starts? ------------------------------------------ ------------------------------------------ MUTUAL RECURSION (NOT IN OUR LANGUAGE) begin proc O begin # O's body code... call E # ... end; proc E; begin # E's body code ... call O # ... end; # ... call O; call E # ... end. One of these must before the other in the code area of the VM... ------------------------------------------ No matter which of O or E is put first, how is the call to the second one to know where the second one starts? 1. solutions ------------------------------------------ SOLUTION STRATEGIES FOR CALLS [Multiple passes]: 1. Generate code for each procedure (+ store offsets in symbol table, + layout procedure code in memory with placholders for calls) 2. Gather table of addresses (map from names to addresses, using offsets and beginning address) 3. Patch up code addresses for calls (+ output code) [Lazy evaluation, labels]: 1. Generate code for each procedure with calls to "labels" (+ store or update labels in symbol table) (+ output code) ------------------------------------------ C. Multiple Passes Solution ------------------------------------------ GENERAL SOLUTION: MULTIPLE PASSES Problem: where does each procedure start? Passes over the IR: 1. Compile all procedure code (now know how big each procedure is) 2. Lay out procedure code in memory (now know where each starts) 3. Change each call instruction ------------------------------------------ What would a progrm need to do to change all the call instructions? D. Labels Solution ------------------------------------------ GENERAL SOLUTION: LABELS Use "labels" to allow Term "label" is from assembly language ; ... jmp L ; ... L: ; ... ------------------------------------------ ------------------------------------------ APPROACHES TO FIXING LABELS Problem: convert labels to addresses (1) Use multiple passes a. Generate code with labels b. Lay out memory for procedures (determine starting addresses) c. Change labels to addresses advantages: disadvantages: (2) Use shared mutable data (lazy eval.) a. labels are unique placeholders, shared by all uses (calls) b. when address is determined, update the placeholder (and all uses are updated) advantages: disadvantages: ------------------------------------------ 1. label data structure a. Creating and propagating labels ------------------------------------------ LABEL DATA STRUCTURE // file label.h // ... #include "machine_types.h" typedef struct { bool is_set; unsigned int word_offset; } label; // Return a fresh label that is not set extern label *label_create(); // Requires: lab != NULL // Set the address in the label extern void label_set(label *lab, unsigned int word_offset); // Is the given label set? extern bool label_is_set(label *lab); // Requires: label_is_set(lab) // Return the word offset in lab extern unsigned int label_read(label *lab); ------------------------------------------ ------------------------------------------ CREATING LABELS Labels created for all procedures when creating proc_decl_t ASTs. // file ast.h // ... #include "label.h" // ... typedef struct proc_decl_s { file_location *file_loc; AST_type type_tag; struct proc_decl_s *next; // for lists const char *name; struct block_s *block; label *lab; // for code generation } proc_decl_t; // file ast.c // Return an AST for a proc_decl proc_decl_t ast_proc_decl(ident_t ident, block_t block) { proc_decl_t ret; ret.file_loc = // ... ret.type_tag = proc_decl_ast; ret.next = NULL; ret.name = ident.name; block_t *p = // ... ret.block = p; // this is the source of the labels ret.lab = label_create(); assert(ret.lab != NULL); return ret; } ------------------------------------------ ------------------------------------------ PROPAGATING POINTERS TO LABELS (1) Labels added to attributes of procedure names // file id_attrs.h //... #include "label.h" typedef struct { file_location file_loc; id_kind kind; // kind of identifier unsigned int offset_count; // for a procedure, its label label *lab; } id_attrs; ------------------------------------------ ------------------------------------------ PROPAGATING POINTERS TO LABELS (2) Make call statement ASTs point to the label of the procedure being called // file scope_check.c // check the statement to make sure that // the procedure has been declared // (if not, then produce an error). // Modifies the given AST // to have appropriate id_use pointers. void scope_check_callStmt( call_stmt_t *stmt) { } ------------------------------------------ ------------------------------------------ PROPAGATING POINTERS TO LABELS (3) Associate labels with each call instruction in code structures // file code.h // ... #include "label.h" typedef struct code_s { struct code_s *next; bin_instr_t instr; // labels for call instructions label *lab; } code; // ... // Requires: lab != NULL // Create and return a fresh instruction // with the named mnemonic and parameters extern code *code_call(address_type a, label *lab); // ... ------------------------------------------ Where should the label passed to the code_call function come from? So what has been achieved? b. Using labels to fill in addresses of procedures i. Putting addresses of Procedures in Labels ------------------------------------------ SETTING LABELS IN PROCEDURES (1) // file label.h typedef struct { bool is_set; unsigned int word_offset; } label; ------------------------------------------ Where is the address of a procedure known? ------------------------------------------ SETTING LABELS IN PROCEDURES (2) // file gen_code.c // ... void gen_code_proc_decl(proc_decl_t pd) { } ------------------------------------------ What makes code for a procedure? At the end of code generation, what has this achieved? ii. Fixing up the Call Instructions Can some call instructions not have their labels set at the end of code generation? ------------------------------------------ PUTTING ADDRESSES IN CALL INSTRUCTIONS Write a function to fix all call instructions in a code_seq extern void code_seq_fix_labels(code_seq cs); ------------------------------------------ How would you write such a function? E. testing the solution How would you test a solution? F. exercise