COP 3402 meeting -*- Outline -*- * Code Generation for Procedures ------------------------------------------ SUPPORTING PROCEDURES AND CALLS Main issues: - storing their code Why? - knowing exactly where each starts Why? Another issue: - sending the right static link ------------------------------------------ ... we only want to execute their code when called, and they can be called from anywhere (their name is visible) (in some languages also from where they can be accessed from data structures) ... because machines don't support symbolic names, need (absolute) address for the call instruction e.g., in the SRM, need an address for JAL instruction Q: What static link does a called procedure need? The one for the scope in which it is defined; this will be the link for the AR given by the number of levels outward where the procedure name was declared. For example, suppose we we compiling code for procedure P, and the block in P has a statement that calls procedure Q. There are several cases: Q was declared in P's block, so the call and Q are in the same scope, so levelsOutward == 0 for Q (from the call); thus want to pass FP as the static link. Q was declared in the same block as P: so levelsOutward == 1 for Q thus want to pass P's static link as the static link Q was declared in a scope surrounding where P was declared so levelsOutward == N > 1 for Q (from call site), so want to pass static link for N levels outward Thus it's always the levels outward of the name Q ** Where to store code for procedures? We can't just put the code for procedures in the main program's code sequence Why? We don't want them executed when the program starts running, only when called! So they have to be stored somewhere... A fundamental issue is this: Q: Where do we put the code sequences for each procedure? Becuase we have to know where they are in the memory of the VM so we can call them with their address (As the VM doesn't support names for procedures directly.) Note that in the SRM, the BOF loading process only allows code/instructions at the beginning of memory, but does allow the BOF to specify where execution should start ------------------------------------------ WHERE TO PUT PROCEDURE CODE? Possible layouts in VM's code array: ------------------------------------------ ... (1) store main program first, then procedures: [code to set up the program's AR] [code for program block] [(optional: code to tear down the program's AR)] EXIT [code for each procedure...] ... (2) store procedures first, then main program: BEQ $0,$0, size(all-proc-code) # skip past the procedures [code for each procedure...] [code to set up the program's AR] [code for program block] [(optional: code to tear down the program's AR)] EXIT Note that the BOF file can set the PC to a start address other than 0, without a jump at the beginning, so the first jump isn't really needed ... (3) store procedures under programmer control: Many languages (like PL/0) don't allow procedure expressions and so procedures can't be values and stored in data But some languages (like Scheme, functional languages) have expressions that denote function values (closures). In such a langauge, the compiler can just: - generate the code for the function as an expression value - let the program do what it wants with the value Q: How would you implement each? Main ideas: A. track start address of each procedure declaration (e.g., as an attribute) B. procedure code is written out (to the BOF) after code generation, so can adjust starting addresses and call instructions, at that time Scheme (1): need to store code for each procedure somewhere, track offset of each procedure, when generating code to call a procedure p, can find p's offset in symbol table when done with the main program's block, know its length, then add that length to offset when writing call instrs into BOF Scheme (2): Store the main program's code sequence somewhere, track start address of each procedure, when generating code to call a procedure p, can find p's start address in symbol table (no changes to call instructions needed when writing to BOF) Scheme (1) has the advantage that it works for code without procedures, making initial testing easier Scheme (2) has the advantage that it doesn't need to fix call instructions when writing to BOF Scheme (3) is usually required by the programming language (when there are procedures/functions as expressions) Q: Which layout makes the most sense? Since scheme (2) boils down to scheme (1) if there are no procedures, and since scheme (2) uses offsets in a less complex way than scheme (1), we recommend using scheme (2) and setting the start address of the main program's code in the BOF. ** how to find each procedure's starting address? Most machines only support calls to absolute addresses so the compiler needs to know exactly where each procedure starts (it's code address) to put in the call (JAL for the SRM) instruction... However, nesting requires patching up call instructions in some way Consider: ------------------------------------------ NESTED PROCEDURES ARE A PROBLEM procedure A; procedure B; begin # B's body code... call A # ... # ... end begin # A's body code call B # ... # ... end If lay out the code as [ code for A ] [ code for B ] How do we know the address of B to compile the call to B? What about the other direction? ------------------------------------------ Note that these calls are all to previously declared procedures ... in the layout shown, we don't know B's start address until we know how big A's body is ... in the second scheme, we don't know A's starting address, until we know how big B's body is So we'll need some mechanism for filling in these addresses... ------------------------------------------ RECURSIVE PROCEDURES, SIMILAR PROBLEM procedure R; begin # R's body code ... call R # ... end Before storing code for R, how do we know where it starts? ------------------------------------------ It's like the nested procedure case, but it's the procedure itself that we need to know the size of however: ... we can know R's offset when we start walking R's AST, so we can put in a call to be adjusted later ------------------------------------------ MUTUAL RECURSION procedure O; begin # O's body code... call E # ... end procedure E; begin # E's body code ... call O # ... One of these must before the other in the code area of the VM... ------------------------------------------ If the language, like PL/0, requires procedures to be declared before use in calls, then examples with mutual recursion, like the O and E example, are illegal However, if the language allows such mutually recursive calls, perhaps with forward declarations, as in C, then this is a problem. Q: No matter which of O or E is put first, how is the call to the second one to know where the second one starts? (The problem is we need to know the exact address.) (and the other order has a similar problem). *** solutions ------------------------------------------ SOLUTION STRATEGIES FOR CALLS [Multiple passes]: 1. Generate code for each procedure (+ store offsets in symbol table, + layout procedure code in memory) 2. Gather table of addresses (map from names to addresses, using offsets and beginning address) 3. Patch up code addresses for calls (+ output code) [Lazy evaluation, labels]: 1. Generate code for each procedure with calls to labels (+ store or update labels in symbol table) (+ output code) ------------------------------------------ These solutions assume that where a procedure is in memory does not affect the size of the code/instructions (That is true on the SRM, ** Multiple Passes as a Solution ------------------------------------------ GENERAL SOLUTION: MULTIPLE PASSES Problem: where does each procedure start? Solution idea: 1. Compile all procedure code (now know how big each procedure is) 2. Lay out procedure code in memory (now know where each starts) 3. Change each call instruction ------------------------------------------ Step 3 could be done by a "linker" (when compiler outputs information from steps 1 and 2) Q: What would a progrm need to do to change all the call instructions? iterate over the sequence of instructions, if it's a call, then adjust it ** Labels as a Solution ------------------------------------------ GENERAL SOLUTION: LABELS Use "labels" to allow Term "label" is from assembly language ; ... jmp L ; ... L: ; ... ------------------------------------------ ... the IR to specify a call target (address) that will be determined later ------------------------------------------ APPROACHES TO FIXING LABELS Problem: convert labels to addresses (1) Use multiple passes a. Generate code with labels b. Lay out memory for procedures (determine starting addresses) c. Change labels to addresses advantages: disadvantages: (2) Use shared mutable data (lazy eval.) a. labels are unique placeholders, shared by all uses (calls) b. when address is determined, update the placeholder (and all uses are updated) advantages: disadvantages: ------------------------------------------ ... (advantages of multiple passes) + easy to understand/program + need a second pass (to adjust addresses) anyway ... (disadvantages of multiple passes) time needed is linear in size of compiled code ... (advantages of lazy eval) + can debug some code early (before full implementation) ... (disadvantages of lazy eval) - harder to understand, timing is everything - label data structure must be truly unique (copies destroy the whole idea, so need pointers or references) - still requires multiple passes for mutual recursion (to force all resolutions) *** label data structure for lazy evaluation ------------------------------------------ LABEL DATA STRUCTURE FOR LAZY EVAL // file label.h // ... #include "machine_types.h" typedef struct { bool is_set; unsigned int word_offset; } label; // Return a fresh label that is not set extern label *label_create(); // Requires: lab != NULL // Set the address in the label extern void label_set(label *lab, unsigned int word_offset); // Is the given label set? extern bool label_is_set(label *lab); // Requires: label_is_set(lab) // Return the word offset in lab extern unsigned int label_read(label *lab); ------------------------------------------ So the compiler can create a label (on the heap), and all data pointing to it see the updates (once it's set) ** exercise Write code for the float calculator with let statements ::= let { } in