COP 3402 meeting -*- Outline -*- * Code Generation We'll concentrate on generating code for the SRM, as that is what we are using in class this semester ** overview These notes are based on Appel's book "Modern Compiler Implementation in Java", chapter 7 (Cambridge, 2002). ------------------------------------------ OVERVIEW OF CODE GENERATION .. ASTs...-> [ Static Analysis ] | | IR v [ Code Generation ] | | Machine Code | v Virtual Machine Execution The IR (= Intermediate Representation) records ------------------------------------------ ... information from static analysis including attributes of names used *** IR (Intermediate Representation) We're going to focus on the translation from IR to Machine Code (circle that) and the differences between ASTs and IR Q: What kind of information is needed from a name's use in order to generate code? Its lexical address Q: Should the parser create the lexical address of a name's use during paring? No, that needs information that is more readily available during static analysis (from the symbol table). Q: Is the symbol table unchanging (immutable)? No, it is updated as scopes are entered and left... So is it convenient to recreate during each pass? No, want to store the information for each name in the IR ------------------------------------------ IR TREES An IR is a tree structure, Helps in modularizing compilers and code generation WITHOUT IR WITH IR Java ------>x86 Java ->x86 \/ ||| \ / C ------>MIPS C \\ /-->MIPS \\/ || ->IR C++ ------>Sparc C++ // \-->Sparc \\\/ / \ C# ------>A1 C#/ \>A1 ------------------------------------------ ... somewhat like an AST, but a kind of abstract machine code, with information needed for code generation ... draw lines on left from each language to each machine and on right from each language to IR and from IR to each machine The advantage is that with the IR, each language compiles to the same IR and there only has to be one code generator built for each machine architecture ------------------------------------------ OUR CHOICES FOR AN IR To keep things simple, we will use a modified AST type as an IR Parser: - records - provides Static analysis: - records ------------------------------------------ Making the IR like the ASTs puts more work on the code generator, an IR that is more like Machine Code would be translated partway by the static analysis phase (or that idea could be used internally in code generation) ... structure of programs ... placeholders for attributes needed for code generation ... attributes of names used (after finding them) *** General strategy ------------------------------------------ GENERAL STRATEGY FOR CODE GENERATION Don't try to optimize! Follow the grammar of ------------------------------------------ ... instead of optimizing, look for simplest translation that could work ... the ASTs (abstract syntax) doing a tree walk, generating IRs since our IR is basically like ASTs with some symbol table information we'll do this in scope checking. Trust the recursion! but keep the recursion simple ------------------------------------------ FOLLOWING THE GRAMMAR Code resembles the grammar that When ------------------------------------------ ... describes the input data (in this case, ASTs) ... the grammar is recursive, the code is recursive ... the grammar has alternatives, the code has conditionals (or a switch) Q: How does this relate to the parser? Our code for parsers followed the grammar in this way. Q: Why is this useful? - can see that all possible inputs are covered (all cases) - coding responsibilities are clear (e.g., which functions eat input) ** Translation target: code sequences ------------------------------------------ TARGET: CODE SEQUENCES Need lists of machine code Why? ------------------------------------------ ... instructions Q: Why are code sequences needed? To be run by the VM, executing one instruction after another (and this is sequential execution) ------------------------------------------ REPRESENTING CODE SEQUENCES IN C #include "instruction.h" // code that can be in a sequence typedef struct code_s code; // code sequences typedef code *code_seq; // machine code instructions typedef struct code_s { code_seq next; bin_instr_t instr; } code; ------------------------------------------ I'm using a linked list to represent code sequences The bin_instr_t type is from the VM implementation ** Designing Code Sequences *** Overall strategies ------------------------------------------ STRATEGIES FOR DESIGNING CODE SEQUENCES Work backwards ------------------------------------------ ... Work backwards, starting with the ultimate instruction you want to use, and then figure out how get ready for that... ------------------------------------------ EXPRESSION EVALUATION Example: (E1 + E2) - (E3 / E4). Constraints: - Expressions have a result value - Binary operations (+, -, *, /) in the SRM need 2 registers Where should the result be stored? Can it be a register? ------------------------------------------ Suppose we try to always keep the result of an expression, E1, in a register, that means we reserve one register, say r1, for the result of E1, and r1 can't be used for other expression evaluation, since that would destroy E1's value (in r1). ... no, there are only a finite number of registers (25 in the SRM), and an expression (like E2 op E3) can have arbitrarily many subexpressions. So can't reserve 1 register per subexpression A solution is to NOT reserve registers for expression values, but to use the runtime stack and to use 2 registers (say r1 and r2) for binary operations. (So no register is ever reserved for an expression's value.) Every expression's result goes on the stack, e.g., to evaluate E2 op E3: [evaluate E2, pushing its value on the stack] [evaluate E3, pushing its value on the stack] [pop E3's value from stack into r2] [pop E2's value from stack into r1] [instruction to compute r1 op r2, putting its value in r1] [push r1 on the stack] # now r1 and r2 are free to use more # but note that old values of r1 and r2 are destroyed Since this stack-based evaluation reserves no registers for expression values, it can evaluate arbitrarily complex expressions. Another alternative, used in production compilers (and LLVM), imagine there are an infinite number of registers, r1, r2, r3, ... evaluate each expression, storing its result in a (reserved) register. Suppose this evaluation uses M registers (one for each subexpression): and the machine has N registers available Then: - reserve 2 registers for binary operation evaluation (say r1 and r2), - assign the remaining N-2 registers to the first N-2 subexpressions, and for the rest use the stack (as above). (There can be various ways to do this, e.g., if there are constants, keep those where they are in storage until needed for binary operations) Which register? To evaluate binary expressions, need at least 2 registers, so the register target should be a parameter (changable) Let's say we always use $v0 to hold the first result for now, and $at for the second one, just before operating on them; this is just a convention, however. Addressing variables: ... want to use LW and SW instructions so we need to use the AR's base address + an offset How do we get the offset? From the identifer's offset in its AR (available in the id_attrs and we laid out the AR so that FP points to offset 0) Why do we want the AR's base address? that is where offets are computed from Where does the AR's base address need to be stored? in a register, as that's where LW and SW need it Which register, we should pick one, but it can't be FP, let's use $t9 to start with. How do we compute the AR's base address? (in general, in PL/0 this is needed for nested procedures) Use the number of levels out from x's id_use start with the FP, that's base for 0 levels out, fetch the static link from each AR for the number of levels out but we also need to decide what base register to use, Let's say we'll use $t9 for now... *** use of registers ------------------------------------------ USE OF REGISTERS What if the register is already in use? e.g., $v0 for expression's value consider x := y + z Strategies: - use a different register - save and restore ------------------------------------------ ... but will eventually run out of registers, so using a different register only works a bit ... save the register's value for when will need it later, after the other use is done, restore it, and continue (This works in general, as no code ever reserves a register) ------------------------------------------ GENERAL STRATEGY FOR EXPRESSIONS Each expression's value goes To operate on an expression's value in a register r: ------------------------------------------ ... on top of the runtime stack use the code module's function code_push_reg_on_stack pop it off the top of the stack into a register using the code module's function code_pop_stack_into_reg *** Background on SRM instructions ------------------------------------------ BACKGROUND: SRM INSTRUCTIONS ADD s,t,d "GPR[d] = GPR[s]+GPR[t]" SUB s,t,d "GPR[d] = GPR[s]-GPR[t]" MUL s,t "HI,LO = GPR[s]*GPR[t]" DIV s,t "HI = GPR[s] % GPR[t]" and "LO = GPR[s] / GPR[t]" LW b,t,o "GPR[t] = memory[GPR[b]+4*o]" SW b,t,o "memory[GPR[b]+4*o] = GPR[t]" ADDI s,t,i "GPR[t] = GPR[s]+sgnExt(i) How to move value from r1 to r2? What limitations on immediate operands? What if the literal doesn't fit? ------------------------------------------ Q: If the numbers are small enough, where is the result of a multiplication as a 32 bit integer located: HI or LO? Q: How would you move the value in register r1 to register r2? use ADD 0, r1, r2 Q: Are there limitations on the immediate operands for ADDI? Yes immediate operatnds must fit into 16 bits (a short int in C), since it's in 2's compliment format for the SRM it must be between -65536 and 65535 (inclusive). Q: What can you do if you want a constant value that doesn't fit? you can save it as global data and load it, so that is what we do in general, with the literal table. ** Literal Table ------------------------------------------ LITERAL TABLE IDEA - Store literal values in - Keep mapping from - Initialize ------------------------------------------ ... data section (above $gp) literal's text or value to (word) offset in data section ... the memory above $gp from the BOF's data section (before running code) The idea is useful for data that doesn't fit into an immediate operand in the instructions ------------------------------------------ LITERAL TABLE IN EXPRESSION EVALUATION Idea for code for numeric expression, N: 1. Look up N in global table, 2. Receive N's 3. generate a load instruction into 4. ------------------------------------------ Q: What's our goal for expression code? Get the value onto the top of the runtime stack ... word offset (from $gp) in the global data section, call this offset Note that LW multiplies the offset by 4, so it's a word offset that gets converted to a byte address ... a register, say $at: LW $gp, $at, offset ... Then what do we always do with expressions? (store $at on top of the stack) ------------------------------------------ LITERAL TABLE AND BOF DATA SECTION How to get the literals into memory with the assumed offsets? ------------------------------------------ ... put them in the BOF file's data section in order of offset ** Activation Record (AR) Layout need to do this so can know how to address constants and variables in an AR Q: Where should constants and variables for a block be stored? on the runtime stack, so can handle recursion. ------------------------------------------ LAYOUT OF AN ACTIVATION RECORD Must save SP, FP, static link, RA and registers $s0-$s7 Can't have offset of static link at a varying offset from FP Layout 1: FP --> [ saved SP ] [ registers FP ] [ static link ] [ RA ] [ $s0 ] [ ... ] [ $s7 ] [ local constants ] [ ... ] [ local variables ] [ ... ] [ temporary storage ] SP -->[ ... ] Layout 2: [ ... ] [ local variables ] [ ... ] FP -->[ local constants ] [ saved SP ] [ registers FP ] [ static link ] [ RA ] [ $s0 ] [ ... ] [ $s7 ] [ temporary storage ] SP -->[ ... ] Advantages of layout 1: Advantages of layout 2: ------------------------------------------ Remember that the stack grows down towards lower addresses! For simplicity, assume that offsets are determined by declaration order, Note that offsets are in numbers of words, since formOffset in the SRM multiplies by 4 Q: What are the advantages of layout 1? - straightforward, fixed size subtracted from offset in symbol table - tracing easy for the VM, as it can show memory between FP and SP Q: What are the advantages of layout 2? - simplified offset calculations (all positive!) - variable addresses grow upwards (this would be better for arrays, although none in PL/0) and that corresponds to a programmer's notions about layout (e.g., C overflows work as expected) - offsets for most things are smaller than with layout 1 (in absolute value) How should VM do tracing? show everything between original FP base (from BOF file) and SP (the whole stack), which is good for nested scopes Q: Any disadvantages? The tracing in layout 2 can be handled by the VM... Q: Which layout should we use? We'll use layout 2, as that matches the decision for MIPS ** Declarations Q: Where are constants and variables stored? On the runtime stack, in the local frame (Should we want to do something different for global ones, like use the data section? -- no, simplest not to.) ------------------------------------------ TRANSLATION SCHEME FOR PL/0 DECLARATIONS const c = n; var x; When do blocks start executing? What should be done then? How do we know how much space to allocate? How to initialize constants? How to initialize variables? ------------------------------------------ Q: When are blocks executed in PL/0? when a procedure (or the main program) starts executing Q: When starting to execute a block, what should be done? [allocate and initialize all declared variables and constants] [save any registers necessary (and set up the FP, etc.)] Q: Which should be allocated first: constants or variables? Want them in reverse order of declaration, so offsets work so the variables come first, then the constants Q: How do we know how much space to allocate? Just process each declaration in a code sequence, so don't really need to know that. Q: How to initialize constants? use the literal table's offset for the literal from the $gp register, store that into the stack. I.e., for a const_def of the form x = L, where L is at offset ofst from $gp [allocate one word on the stack (ADDI $sp, $sp, -4)] [load L's value from $gp + 4*ofst into $at (LW $gp, $at, ofst)] [store $at into the stack top (SW $sp, $at, 0)] Q: How to compute the value of the constant? we'll always use the literal table, as that will always work Q: How to initialize variables? variables are initialized to 0, so can use SW $sp, $0, 0 to write 0 into the top of the stack ** Compiling Expressions *** deciding where to start The best way to start is to handle the simplest cases first Q: What are the simplest cases for expressions? literals and then variables binary operator expressions have subexpressions (recursive) ------------------------------------------ TRANSLATING EXPRESSIONS Abstract syntax of expressions in PL/0 E ::= E1 o E2 | x | n o ::= + | - | * | / Simplest cases are: ------------------------------------------ ... numeric literals (n) and variable and constant names (x, c) *** example translations **** numeric literals ------------------------------------------ TRANSLATION SCHEME FOR NUMERIC LITERALS ------------------------------------------ ... - we will always use the literal table, suppose we do literal_table_lookup and it return the offset of ofst - want to put it on top of the stack, so [load value from the data section into $at (LW $gp, $at, ofst)] [push $at onto the stack] we could optimize this if it fits into an immediate operand [allocate a stack location, write the value into it] How much would that save (1 instruction, so 1 machine cycle) Does this mean we need to track offsets in the global data? Yes, that is the job of the literal table ------------------------------------------ TRANSLATION SCHEME FOR VARIABLE NAMES (AND CONSTANTS) ------------------------------------------ want to use LW to bring the value into a register (say $v0) and so can then push it onto the stack - we will use $t9 as a frame register, so as not to disturb $fp # suppose lexical address of x is (levelsOut, ofst) # note that ofst is a positive word offset # get base of x's stack frame into $t9 load FP into $t9 load next static link in $t9 } i.e., LW $t9, $t9, -3 } .. } "levelsOut times" load next static link in $t9 } # load x's value using is base AR frame LW $t9, $v0, ofst [code to push $v0 on top of the stack] **** binary operations already discussed above, idea: evaluate the subexpressions onto the stack, pop them back into registers, operate on them with the appropriate instruction, result in a register push that register onto the stack Q: So, for E1 - E2 what needs to be done? [code to evaluate E1 onto the top of the stack] # recursive! [code to evaluate E2 onto the top of the stack] # recursive! [code to pop the top of the stack, i.e., E2's value, into $at] [code to pop the top of the stack, i.e., E1's value, into $v0] [SUB $v0, $at, $v0] [code to push $v0 (i.e., E1 - E2) onto the top of the stack] ** Statements *** Basic Statements Q: What are the base cases in the grammar for statements? I.e., what statements don't contain other statements? - skip - assignments - read - write - call ------------------------------------------ TRANSLATION SCHEME FOR BASIC STATEMENTS skip x := E read x write E ------------------------------------------ ... skip [Do nothing. Need some instruction that does nothing, like adding 0 to a register or shifting a register 0 bits. Let's use [SLL $at, $at, 0].] ... x := E (Suppose lexical address of x is (levelsOut,ofst), that information should be in the AST, so need the scope_check module to put an id_use in x's AST.) (Let's use $v0 to hold E's value and use $t9 for the frame pointer in x's AR, so it's:) [code to eval E onto the top of the stack] [code to load base of x's scope's frame into $t9] [code to pop top of stack into $v0] [SW $t9, $v0, ofst] # store's $v0's value (i.e., E's value) # into memory[$t9+(4*ofst)] Q: For testing, want to know: What are the simplest cases? x is declared in the current scope (0 levels out) (this is the only case we're worrying about in fall 2023) E is a literal (so implement that case for expressions, using the literal table) Q: In general, can the "levels outwards" part of the lexical address be determined when the variable is declared? No, it depends on the nesting of the use, but can get it from the variable's id_use Q: Does the same thing work for constants? Yes, they are just initialized variables that don't change ... read x (Suppose x is a variable at lexical address (lvls, ofst) [RCH] # read a character and put it in $v0 [code to push $v0 on top of the stack] [code to load x's scope's frame pointer into $t9] [code to load top of stack into $v0] [SW $t9, $v0, ofst] ... write E Q: Should we write a character with code E or the digits of E? Probably want both, and we added a PINT instruction to VM to print integer value (and PFLT to print a float value in FSRM) Should also add a new kind of statement, say print E to PL/0 (and FLOAT) to print a single character For now we will leave PINT as writing an integer value, We want the result of E to be in $a0, since $a0 is where args go for system calls like PINT, So we would use a design like: [code to eval E onto top of the stack] [code to pop stack into $a0] [PINT] Another basic statement is call p (but for fall 2023 we aren't handling that, and in any case we would worry about procedures later) ** Conditions *** Overall conditions Conditions are somewhat like expressions, and can contain expressions, so can't reserve registers for their values Instead, like expressions, they should always store their (truth) value on top of the runtime stack ------------------------------------------ GRAMMAR FOR CONDITIONS ::= odd | ::= = | <> | < | <= | > | >= So the code recursion structure is? Code looks like: ------------------------------------------ // return a code sequence to put the truth value // (1 for true, 0 for false) of the condition on top of stack gen_code_condition(condition_t cond) does a switch on cond.cond_kind and calls either gen_code_odd_cond(cond.data.odd_cond) or gen_code_relop_cond(cond.data.rel_op_cond) Q: What should these functions return? code sequences (code_seq) Write the code for gen_code_condition... **** Relational operator conditions ------------------------------------------ RELATIONAL OPERATOR CONDITIONS ::= A design for conditions: Goal: put true of false on top of stack for the value of the condition Consider E1 <> E2 [Evaluate E1 to top of stack] [Evaluate E2 to top of stack] [pop top of stack (E2's value) into $at] [pop top of stack (E1's value) into $v0] # jump past 2 instrs, # if GPR[$v0]!=GPR[$at] BNE $v0, $at, 2 # put 0 (false) in $v0 ADD $0, $0, $v0 # jump over next instr BEQ $0, $0, 1 # pub 1 (true) in $v0 ADDI $0, $v0, 1 # now $v0 has the truth value [code to push $v0 on top of stack] Consider E1 >= E2 [Evaluate E1 to top of stack] [Evaluate E2 to top of stack] [pop top of stack (E2's value) into $at] [pop top of stack (E1's value) into $v0] SUB $v0, $at, $v0 # $v0 = E1 - E2 # jump past 2 instrs, # if GPR[$v0]>=GPR[$at] # if E1-E2 >= 0 BGEZ reg, $at, 2 # skip 2 instrs # put 0 (false) in reg ADD $0, $0, reg # jump over jext instr BEQ $0, $0, 1 # pub 1 (true) in reg ADDI $0, reg, 1 ------------------------------------------ explain all of this. Note that E1 >= E2 is true just when E1 - E2 >= 0 (subtract E2 from both sides) Note that BEQ $0, $0, 1 skips the next instruction (since GPR[0] = GPR[0], for ints) Q: What would work for =? Use BEQ instead of BNE Q: What would you do for < ? use BLTZ instead of BGEZ, similarly for <= and >. ------------------------------------------ CODE FOR BINARY RELOP CONDITIONS // file ast.h typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr1; token_t rel_op; expr_t expr2; } rel_op_condition_t; // file gen_code.c // Requires: reg != $at // Generate code for evaluating condAST into reg // Modifies when executed: reg, $at code_seq gen_code_relop_cond( rel_op_condition_t condAST, reg_num_type reg) { } ------------------------------------------ ... [code to push E1's value on the stack] [code to push E2's value on the stack] [code to push the truth value of E1 rel_op E2 on top of stack] ** Control Flow Statements (Compound Statements) These are the compound statements Q: Why is it useful to write the base cases first? Testing is easier ------------------------------------------ ABSTRACT SYNTAX FOR COMPOUND STATEMENTS S ::= begin { S } | if C S1 S2 | while C S So what is the code structure? Code looks like: begin S1 S2 ... end if C S1 S2 while C S ------------------------------------------ ... # begin S1 S2 ... [code for S1] [code for S2] # concat them all! ... ... if C then S1 else S2 [code to push C's truth value on top of stack] [code to pop top of stack into $v0] BEQ $0, $v0, [length(S1)+1] # skip S1 if false [code for S1] BEQ $0, $0, [length(S2)] # skip else part (finish) [code for S2] Why add 1 to length of S1? to account for the instruction that skips over the else part Note this requires the computation of the code sequences for S1 and S2 first, so know how long they are ... # while C do S cond: [code to push C's truth value on top of stack] [code to pop top of stack into $v0] BEQ $0, $v0, [length(S)+1] # skip S if false (goto exitLoop) [code for S] BEQ $0, $0, -(length(S)+length(C)+1) #jump back (goto cond) exitLoop: