OVERVIEW OF CODE GENERATION .. ASTs...-> [ Static Analysis ] | | IR v [ Code Generation ] | | Machine Code | (.bof file) v Virtual Machine Execution The IR (= Intermediate Representation) records GENERAL STRATEGY FOR CODE GENERATION Don't try to optimize! (optimize in a separate pass) Follow the grammar of the ASTs FOLLOWING THE GRAMMAR Code resembles the grammar that is used for the ASTs When the grammar is recursive, then the code generator is recursive when the grammar has alternatives the the code generator has a switch/if statement TARGET: CODE SEQUENCES Need lists of machine code Why? 1. statements in PL/0 often get translated into multiple instructions in the VM 2. need to indicate sequential execution REPRESENTING CODE SEQUENCES IN C #include "instruction.h" // code that can be in a sequence typedef struct code_s code; // code sequences typedef code *code_seq; // machine code instructions typedef struct code_s { code_seq next; bin_instr_t instr; } code; STRATEGIES FOR DESIGNING CODE SEQUENCES Work backwards EXPRESSION EVALUATION Example: (E1 + E2) - (E3 / E4). Constraints: - Expressions have a result value - Binary operations (+, -, *, /) in the SRM need 2 registers Where should the result be stored? Can it be a register? i.e., can we reserve a register for an expression's value? No! Not enough registers Advice: use the runtime stack rule: the result of every expression is pushed onto the stack so to evaluate E2 op E3 use a code sequence like: [code to evaluate E2, putting its result on the stack] [code to evaluate E3] [pop E3's value into $t3] [pop E2's value into $t2] [evaluate $t2 op $t3 into $t2, e.g., SUB $t2, $t3, $t2] [push $t2 onto the stack] Addressing variables and constants want to use LW (and SW for assignments) so need AR's base address and offset for the main program's block, the AR's base address will be $fp the offset is in the symbol table USE OF REGISTERS What if the register is already in use? e.g., $v0 for expression's value consider x := y + z Strategies: - use a different register not enough - save and restore does work, like putting all expression values on stack GENERAL STRATEGY FOR EXPRESSIONS Each expression's value goes on the runtime stack at "the top" To operate on an expression's value in a register r: [compute expression's value onto top of stack] [pop the stack into r] code_push_reg_on_stack(reg_num_type) code_pop_stack_into_reg(reg_num_type) BACKGROUND: SRM INSTRUCTIONS ADD s,t,d "GPR[d] = GPR[s]+GPR[t]" SUB s,t,d "GPR[d] = GPR[s]-GPR[t]" MUL s,t "HI,LO = GPR[s]*GPR[t]" DIV s,t "HI = GPR[s] % GPR[t]" and "LO = GPR[s] / GPR[t]" LW b,t,o "GPR[t] = memory[GPR[b]+4*o]" SW b,t,o "memory[GPR[b]+4*o] = GPR[t]" ADDI s,t,i "GPR[t] = GPR[s]+sgnExt(i) How to move value from r1 to r2? ADDI r1, r2, 0 ADD $0, r1, r2 What limitations on immediate operands? must fit in a short int (16 bits) What if the literal doesn't fit? e.g., 1999999999 - use global data (words in the data section), use LW to load it when needed use a "literal table" to track offsets for these - could compute the value LITERAL TABLE IDEA - Store literal values in a table - Keep mapping from text/value of the literal to the offset in the data section (from $gp) - Initialize from the BOF's data section LITERAL TABLE IN EXPRESSION EVALUATION Idea for code for numeric expression, N: 1. Look up N in global table, 2. Receive N's offset (from $gp) 3. generate a load instruction into some register (say, $v0) that value LW $gp, $v0, offset 4. push $v0 onto the stack LITERAL TABLE AND BOF DATA SECTION How to get the literals into memory with the assumed offsets? put them into the BOF's data section in order (starting with offset 0) LAYOUT OF AN ACTIVATION RECORD Must save SP, FP, static link, RA and registers $s0-$s7 Can't have offset of static link at a varying offset from FP Layout 1: FP --> [ saved SP ] [ registers FP ] [ static link ] [ RA ] [ $s0 ] [ ... ] [ $s7 ] [ local constants ] [ ... ] [ local variables ] [ ... ] [ temporary storage ] SP -->[ ... ] Layout 2: [ ... ] [ local variables ] [ ... ] FP -->[ local constants ] [ saved SP ] [ registers FP ] [ static link ] [ RA ] [ $s0 ] [ ... ] [ $s7 ] [ temporary storage ] SP -->[ ... ] Advantages of layout 1: - simple, like a stack machine - tracing in the VM is easy Advantages of layout 2: - offsets for constants and variables are what was recorded in the symbol table - tracing of the VM can be done - corresponds to conventions on MIPS recommend layout 2 TRANSLATING EXPRESSIONS Abstract syntax of expressions in PL/0 E ::= E1 o E2 | x | n o ::= + | - | * | / Simplest cases are: numeric literals identifiers binary operator expressions TRANSLATION SCHEME FOR NUMERIC LITERALS - always use the literal table call literal_table_lookup to get offset, ofst - want to put that on top of the stack [load value into (say) $at using ofst] i.e., LW $gp, $at, ofst [push $at onto the stack] TRANSLATE THE WRITE STATEMENT e.g., write 3402 [evaluate the expression (onto the stack)] [pop the stack into $a0] PINT TRANSLATION SCHEME FOR VARIABLE NAMES (AND CONSTANTS) want to use LW instruction bring value into a register, say reg (if no procedures) FP is the frame pointer for the AR (with procedures, suppose lexical address is (lo,ofst) # compute the base of name's AR into $t9 [move $fp into $t9] while lo > 0 LW $t9, $t9, -3 # fetch static link from AR # now $t9 is the frame pointer for the AR # where the name was declared use id_use_attrs(id->idu) to get attributes from the AST id for the identifier expression unsigned short ofst = id_use_attrs(id->idu)->offset_count (so, if no procedures) LW $fp, reg, ofst [push reg onto the stack] (so, if have procedures LW $t9, reg, ofst [push reg onto the stack] TRANSLATION SCHEME FOR BINARY OPER EXPRS Goal is to get to an instruction like ADD, SUB, MUL, DIV (then push result onto stack) Example: E1 - E2 [code to evaluate E1] [code to evaluate E2] [code to pop E2's value into $t2] [code to pop E1's value into $t1] SUB $t1, $t2, $t1 [code to push $t1 onto the stack] TRANSLATION SCHEME FOR PL/0 DECLARATIONS const c = n; var x; When do blocks start executing? when the procedure or the main program starts executing What should be done then? [allocate and intialize the variables and constants] [save any registers necessary and set up the AR] How do we know how much space to allocate? allocate space for each constant and variable (each is one word) What order? to get the stack AR layout 2 as before, do the variables then the constants in reverse order How to initialize constants? use literal table's offsets and LW LW $gp, $at, offset (where offset is computed from the literal table) SW $sp, $at, 0 How to initialize variables? SW $sp, $0, 0 e.g., ADDI $sp, $sp, -4 # allocate a word SW $sp, $0, 0 # initialize the variable to 0 TRANSLATION SCHEME FOR BASIC STATEMENTS skip SRL $at, $at, 0 x := E suppose offset for x is ofst (from id_use) [evaluate E (onto the stack)] [get the frame pointer for x's location into a register, $t9] [pop the stack into $at] SW $t9, $at, ofst read x suppose offset for x is ofst (from id_use) RCH # puts char read into $v0 [get the frame pointer for x's location into a register, $t9] SW $t9, $v0, ofst write E [evaluate E] [pop the stack into $a0] PINT GRAMMAR FOR CONDITIONS ::= odd | ::= = | <> | < | <= | > | >= So the code recursion structure is? Code looks like: RELATIONAL OPERATOR CONDITIONS ::= A design for conditions: Goal: put true of false on top of stack for the value of the condition Consider E1 <> E2 [Evaluate E1 to top of stack] [Evaluate E2 to top of stack] [pop top of stack (E2's value) into $at] [pop top of stack (E1's value) into $v0] # jump past 2 instrs, # if GPR[$v0]!=GPR[$at] BNE $v0, $at, 2 # put 0 (false) in $v0 ADD $0, $0, $v0 # jump over next instr BEQ $0, $0, 1 # pub 1 (true) in $v0 ADDI $0, $v0, 1 # now $v0 has the truth value [code to push $v0 on top of stack] Consider E1 >= E2 [Evaluate E1 to top of stack] [Evaluate E2 to top of stack] [pop top of stack (E2's value) into $at] [pop top of stack (E1's value) into $v0] SUB $v0, $at, $v0 # $v0 = E1 - E2 # jump past 2 instrs, # if GPR[$v0]>=GPR[$at] # if E1-E2 >= 0 BGEZ reg, $at, 2 # skip 2 instrs # put 0 (false) in reg ADD $0, $0, reg # jump over jext instr BEQ $0, $0, 1 # pub 1 (true) in reg ADDI $0, reg, 1 CODE FOR BINARY RELOP CONDITIONS // file ast.h typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr1; token_t rel_op; expr_t expr2; } rel_op_condition_t; // file gen_code.c // Requires: reg != $at // Generate code for evaluating condAST into reg // Modifies when executed: reg, $at code_seq gen_code_relop_cond( rel_op_condition_t condAST, reg_num_type reg) { } ABSTRACT SYNTAX FOR COMPOUND STATEMENTS S ::= begin { S } | if C S1 S2 | while C S So what is the code structure? Code looks like: begin S1 S2 ... end if C S1 S2 while C S SUPPORTING PROCEDURES AND CALLS Main issues: - storing their code Why? - knowing exactly where each starts Why? Another issue: - sending the right static link WHERE TO PUT PROCEDURE CODE? Possible layouts in VM's code array: NESTED PROCEDURES ARE A PROBLEM procedure A; procedure B; begin # B's body code... call A # ... # ... end begin # A's body code call B # ... # ... end If lay out the code as [ code for A ] [ code for B ] How do we know the address of B to compile the call to B? What about the other direction? RECURSIVE PROCEDURES, SIMILAR PROBLEM procedure R; begin # R's body code ... call R # ... end Before storing code for R, how do we know where it starts? MUTUAL RECURSION procedure O; begin # O's body code... call E # ... end procedure E; begin # E's body code ... call O # ... One of these must before the other in the code area of the VM... SOLUTION STRATEGIES FOR CALLS [Multiple passes]: 1. Generate code for each procedure (+ store offsets in symbol table, + layout procedure code in memory) 2. Gather table of addresses (map from names to addresses, using offsets and beginning address) 3. Patch up code addresses for calls (+ output code) [Lazy evaluation, labels]: 1. Generate code for each procedure with calls to labels (+ store or update labels in symbol table) (+ output code) GENERAL SOLUTION: MULTIPLE PASSES Problem: where does each procedure start? Solution idea: 1. Compile all procedure code (now know how big each procedure is) 2. Lay out procedure code in memory (now know where each starts) 3. Change each call instruction GENERAL SOLUTION: LABELS Use "labels" to allow Term "label" is from assembly language ; ... jmp L ; ... L: ; ... APPROACHES TO FIXING LABELS Problem: convert labels to addresses (1) Use multiple passes a. Generate code with labels b. Lay out memory for procedures (determine starting addresses) c. Change labels to addresses advantages: disadvantages: (2) Use shared mutable data (lazy eval.) a. labels are unique placeholders, shared by all uses (calls) b. when address is determined, update the placeholder (and all uses are updated) advantages: disadvantages: LABEL DATA STRUCTURE FOR LAZY EVAL // file label.h #include "machine_types.h" typedef struct { bool is_set; address_type byte_addr; } label; // Return a fresh label that is not set extern label *label_create(); // Set the address in the label extern void label_set(label *lab, address addr); // Is lab set? extern bool label_is_set(label *lab); // Requires: label_is_set(lab) // Return the address in lab. extern address_type label_read(label *lab); CONTEXT VM changes (from HW4's VM): // words are ints or floats typedef enum {int_type, float_type} word_type; // words for this machine typedef struct word_s { word_type type_tag; union word_u { int i; float f; } data; } word; No MOD instruction Changed to RND instruction (rounding) FLOAT Language changes ::= '{' { } '}' | ... where ::= { } ::= float | bool ENHANCED ASTS AS AN IR See ast.h Changes to ASTs: // S ::= begin { VD } { S } typedef struct { AST_list vds; AST_list stmts; } begin_t; IDENTIFIER USES IN ASTs // E ::= x typedef struct { // name of a constant or variable const char *name; // set during static analysis, // includes info for lexical addr id_use *idu; } ident_t; // S ::= read x typedef struct { AST *ident; } read_t; // S ::= assign x E typedef struct { AST *ident; AST *exp; } assign_t; ID_USE STRUCTURES typedef struct { id_attrs *attrs; unsigned int levelsOutward; } id_use; ID ATTRIBUTES // attributes of idents typedef struct { file_location file_loc; var_type vt; // type // offset from beginning of scope unsigned int loc_offset; } id_attrs; // where: typedef enum {float_t, bool_t} var_type; void scope_check_beginStmt(AST *stmt) { symtab_enter_scope(); // <******* scope_check_varDecls( stmt->data.begin_stmt.vds); AST_list stmts = stmt->data.begin_stmt.stmts; while (!ast_list_is_empty(stmts)) { scope_check_stmt( ast_list_first(stmts)); stmts = ast_list_rest(stmts); } symtab_leave_scope(); // <******** } GENERATING CODE Done in the file gen_code.c - Functions arranged to walk the ASTs - All return a code_seq Useful files: ast.h id_attrs.h id_use.h code.h STEPS FOR CODE GENERATION 1. start with the base cases 2. Write simplest tests possible 3. Design code sequences for the nonterminals involved 4. Write code for each node of the AST 5. Test it: a. check the output machine code b. check the VM's execution EXPRESSIONS What are the base cases? A very simple test: What code sequence do we want? GENERATING CODE Where does execution start? What is the AST for our program? Where do we generate those code sequences? Let's write it! NESTED SCOPES Example in FLOAT: # $Id$ float x; { float y; { float z; z = 0; y = 1; x = 2; } } What kind of code sequence for this?