COP 3402 meeting -*- Outline -*-

* Code Generation

     We'll concentrate on generating code for the SRM,
     as that is what we are using in class this semester

** overview
     These notes are based on
     Appel's book "Modern Compiler Implementation in Java",
     chapter 7 (Cambridge, 2002).

------------------------------------------
         OVERVIEW OF CODE GENERATION

 .. ASTs...-> [ Static Analysis ]
                     |
                     | IR
                     v
              [ Code Generation ]
                     |
                     | Machine Code
                     |
                     v
               Virtual Machine
                   Execution

The IR (= Intermediate Representation)
    records


------------------------------------------
         ... information from static analysis
             including attributes of names used

*** IR (Intermediate Representation)

         We're going to focus on the translation from IR to Machine Code
         (circle that) and the differences between ASTs and IR

         Q: What kind of information is needed from a name's use
            in order to generate code?
            Its lexical address

         Q: Should the parser create the lexical address of a name's use
            during paring?
            No, that needs information that is more readily available
            during static analysis (from the symbol table).

         Q: Is the symbol table unchanging (immutable)?
            No, it is updated as scopes are entered and left...
            So is it convenient to recreate during each pass?
            No, want to store the information for each name in the IR


------------------------------------------
            IR TREES

An IR is a tree structure,


Helps in modularizing compilers
and code generation

   WITHOUT IR               WITH IR

Java ------>x86       Java        ->x86
     \/ |||                \     /
   C ------>MIPS         C \\   /-->MIPS
     \\/ ||                 ->IR
 C++ ------>Sparc      C++ //   \-->Sparc
      \\\/                 /     \
  C# ------>A1          C#/       \>A1

------------------------------------------
    ... somewhat like an AST,
        but a kind of abstract machine code,
        with information needed
        for code generation

    ... draw lines on left from each language to each machine
        and on right from each language to IR and from IR to each machine

        The advantage is that with the IR, each language compiles to
        the same IR and there only has to be one code generator built
        for each machine architecture

------------------------------------------
          OUR CHOICES FOR AN IR

To keep things simple, we will use
   a modified AST type as an IR

Parser:
   - records
   - provides


Static analysis:
   - records


------------------------------------------
    Making the IR like the ASTs puts more work on the code generator,
     an IR that is more like Machine Code would be translated partway
     by the static analysis phase
     (or that idea could be used internally in code generation)

    ... structure of programs
    ... placeholders for attributes
        needed for code generation

    ... attributes of names used (after finding them)

*** General strategy
------------------------------------------
   GENERAL STRATEGY FOR CODE GENERATION

Don't try to optimize!


Follow the grammar of


------------------------------------------
      ...
         instead of optimizing,
         look for simplest translation that could work

      ...
        the ASTs (abstract syntax)
        doing a tree walk,
        generating IRs

        since our IR is basically like ASTs with
          some symbol table information
          we'll do this in scope checking.

        Trust the recursion!
              but keep the recursion simple
------------------------------------------
       FOLLOWING THE GRAMMAR

Code resembles the grammar that


When


------------------------------------------
     ...  describes the input data
          (in this case, ASTs)

     ... the grammar is recursive,
         the code is recursive

     ... the grammar has alternatives,
         the code has conditionals (or a switch)

     Q: How does this relate to the parser?
        Our code for parsers followed the grammar in this way.
     Q: Why is this useful?
        - can see that all possible inputs are covered (all cases)
        - coding responsibilities are clear
          (e.g., which functions eat input)

** Translation target: code sequences
------------------------------------------
         TARGET: CODE SEQUENCES

Need lists of machine code

Why?


------------------------------------------
        ... instructions

        Q: Why are code sequences needed?
        To be run by the VM, executing one instruction after another
        (and this is sequential execution)

------------------------------------------
     REPRESENTING CODE SEQUENCES IN C

#include "instruction.h"

// code that can be in a sequence
typedef struct code_s code;
// code sequences
typedef code *code_seq;

// machine code instructions
typedef struct code_s {
    code_seq next;
    bin_instr_t instr;
} code;

------------------------------------------
        I'm using a linked list to represent code sequences
        The bin_instr_t type is from the VM implementation

** Designing Code Sequences
*** Overall strategies
------------------------------------------
 STRATEGIES FOR DESIGNING CODE SEQUENCES

Work backwards


------------------------------------------
     ... Work backwards,
        starting with the ultimate instruction you want to use,
        and then figure out how get ready for that...

------------------------------------------
       EXPRESSION EVALUATION

Example: (E1 + E2) - (E3 / E4).


Constraints:
 - Expressions have a result value
 - Binary operations (+, -, *, /)
   in the SRM
   need 2 registers

Where should the result be stored?

  Can it be a register?


------------------------------------------

    Suppose we try to always keep the result of an expression, E1, in a register,
       that means we reserve one register, say r1,
       for the result of E1, and r1 can't be used for other expression
       evaluation, since that would destroy E1's value (in r1).

   ... no, there are only a finite number of registers
        (25 in the SRM),
        and an expression (like E2 op E3)
        can have arbitrarily many subexpressions.

        So can't reserve 1 register per subexpression

    A solution is to NOT reserve registers for expression values,
       but to use the runtime stack and to use 2 registers
       (say r1 and r2) for binary operations.
       (So no register is ever reserved for an expression's value.)
       Every expression's result goes on the stack,
          e.g., to evaluate E2 op E3:
                  [evaluate E2, pushing its value on the stack]
                  [evaluate E3, pushing its value on the stack]
                  [pop E3's value from stack into r2]
                  [pop E2's value from stack into r1]
                  [instruction to compute r1 op r2, putting its value in r1]
                  [push r1 on the stack]
                  # now r1 and r2 are free to use more
                  # but note that old values of r1 and r2 are destroyed
       Since this stack-based evaluation
           reserves no registers for expression values,
           it can evaluate arbitrarily complex expressions.

    Another alternative, used in production compilers (and LLVM),
       imagine there are an infinite number of registers,
          r1, r2, r3, ...
       evaluate each expression, storing its result in a (reserved) register.
       Suppose this evaluation uses M registers (one for each subexpression):
               and the machine has N registers available
       Then:
          - reserve 2 registers for binary operation evaluation (say r1 and r2),
          - assign the remaining N-2 registers to the first N-2 subexpressions,
             and for the rest use the stack (as above).
       (There can be various ways to do this, e.g.,
         if there are constants, keep those where they are in storage
         until needed for binary operations)
    
     Which register?
        To evaluate binary expressions, need at least 2 registers,
        so the register target should be a parameter (changable)

        Let's say we always use $v0 to hold the first result for now,
          and $at for the second one, just before operating on them;
          this is just a convention, however.

     Addressing variables:
     ... want to use LW and SW instructions
             so we need to use the AR's base address + an offset
         How do we get the offset?
             From the identifer's offset in its AR
                    (available in the id_attrs
                      and we laid out the AR so that FP points to offset 0)
         Why do we want the AR's base address?
             that is where offets are computed from
         Where does the AR's base address need to be stored?
             in a register, as that's where LW and SW need it
           Which register,
             we should pick one, but it can't be FP,
             let's use $t9 to start with.
         How do we compute the AR's base address?
                (in general, in PL/0 this is needed for nested procedures)
             Use the number of levels out from x's id_use
             start with the FP, that's base for 0 levels out,
                fetch the static link from each AR
                    for the number of levels out
             but we also need to decide what base register to use,
               Let's say we'll use $t9 for now...

*** use of registers
------------------------------------------
       USE OF REGISTERS

What if the register is already in use?
   e.g., $v0 for expression's value
     consider   x := y + z

Strategies:
 - use a different register


 - save and restore


------------------------------------------
     ... but will eventually run out of registers,
         so using a different register only works a bit

     ... save the register's value for when will need it later,
         after the other use is done, restore it, and continue
           (This works in general, as no code ever reserves
            a register)

------------------------------------------
     GENERAL STRATEGY FOR EXPRESSIONS

Each expression's value goes


To operate on an expression's value
   in a register r:


------------------------------------------
        ... on top of the runtime stack

        use the code module's function
            code_push_reg_on_stack

        pop it off the top of the stack into a register
        using the code module's function
            code_pop_stack_into_reg

*** Background on SRM instructions
------------------------------------------
      BACKGROUND: SRM INSTRUCTIONS

 ADD s,t,d   "GPR[d] = GPR[s]+GPR[t]"
 SUB s,t,d   "GPR[d] = GPR[s]-GPR[t]"
 MUL s,t     "HI,LO = GPR[s]*GPR[t]"
 DIV s,t     "HI = GPR[s] % GPR[t]"
              and "LO = GPR[s] / GPR[t]"

 LW b,t,o    "GPR[t] = memory[GPR[b]+4*o]"
 SW b,t,o    "memory[GPR[b]+4*o] = GPR[t]"
ADDI s,t,i   "GPR[t] = GPR[s]+sgnExt(i)


How to move value from r1 to r2?


What limitations on immediate operands?


What if the literal doesn't fit?


------------------------------------------
        Q: If the numbers are small enough, where is the result of a
        multiplication as a 32 bit integer located: HI or LO?
        
        Q: How would you move the value in register r1 to register r2?
           use ADD 0, r1, r2

        Q: Are there limitations on the immediate operands for ADDI?
           Yes immediate operatnds must fit into 16 bits (a short int in C),
               since it's in 2's compliment format for the SRM
                 it must be between -65536 and 65535 (inclusive).

        Q: What can you do if you want a constant value that doesn't fit?
               you can save it as global data and load it,

               so that is what we do in general, with the literal
               table.

** Literal Table
------------------------------------------
            LITERAL TABLE IDEA

- Store literal values in


- Keep mapping from


- Initialize


------------------------------------------
       ... data section (above $gp)

       literal's text or value to (word) offset in data section

       ... the memory above $gp
           from the BOF's data section (before running code)

        The idea is useful for data that doesn't fit into an immediate
        operand in the instructions
------------------------------------------
  LITERAL TABLE IN EXPRESSION EVALUATION

Idea for code for numeric expression, N:

    1. Look up N in global table,
    2. Receive N's 


    3. generate a load instruction
         into 


    4. 


------------------------------------------
     Q: What's our goal for expression code?
        Get the value onto the top of the runtime stack

     ... word offset (from $gp)
         in the global data section,
         call this offset

        Note that LW multiplies the offset by 4,
           so it's a word offset that gets converted to a byte address
                
     ... a register, say $at:
        LW $gp, $at, offset

     ... Then what do we always do with expressions?
     (store $at on top of the stack)

------------------------------------------
    LITERAL TABLE AND BOF DATA SECTION

How to get the literals into memory
   with the assumed offsets?


------------------------------------------
        ... put them in the BOF file's data section
            in order of offset
        

** Activation Record (AR) Layout
    need to do this so can know how to address constants and variables
        in an AR

    Q: Where should constants and variables for a block be stored?
       on the runtime stack, so can handle recursion.

------------------------------------------
   LAYOUT OF AN ACTIVATION RECORD

Must save SP, FP, static link, RA
   and registers $s0-$s7

Can't have offset of static link
    at a varying offset from FP

Layout 1:

  FP --> [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ local constants        ]
         [      ...               ]
         [ local variables        ]
         [      ...               ]
         [ temporary storage      ]
   SP -->[       ...              ]


Layout 2:

         [      ...               ]
         [ local variables        ]
         [      ...               ]
   FP -->[ local constants        ]
         [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ temporary storage      ]
   SP -->[       ...              ]


Advantages of layout 1:


Advantages of layout 2:


------------------------------------------
       Remember that the stack grows down towards lower addresses!

       For simplicity, assume that offsets are determined by
       declaration order,
       Note that offsets are in numbers of words,
          since formOffset in the SRM multiplies by 4

       Q: What are the advantages of layout 1?
           - straightforward, fixed size subtracted
              from offset in symbol table
           - tracing easy for the VM,
              as it can show memory between FP and SP

      Q: What are the advantages of layout 2?
           - simplified offset calculations (all positive!)
           - variable addresses grow upwards
               (this would be better for arrays, although none in PL/0)
                and that corresponds to a programmer's notions about layout
                  (e.g., C overflows work as expected)
           - offsets for most things are smaller than with layout 1
               (in absolute value)
         How should VM do tracing?
           show everything between original FP base (from BOF file)
               and SP (the whole stack), which is good for nested scopes

      Q: Any disadvantages?
          The tracing in layout 2 can be handled by the VM...

      Q: Which layout should we use?
          We'll use layout 2, as that matches the decision for MIPS

** Declarations
   Q: Where are constants and variables stored?
     On the runtime stack, in the local frame
     (Should we want to do something different for global ones,
       like use the data section? -- no, simplest not to.)

------------------------------------------
 TRANSLATION SCHEME FOR PL/0 DECLARATIONS

   const c = n;
   var x;

When do blocks start executing?


What should be done then?


How do we know how much space to allocate?


How to initialize constants?


How to initialize variables?


------------------------------------------
      Q: When are blocks executed in PL/0?
         when a procedure (or the main program) starts executing

      Q: When starting to execute a block, what should be done?
         [allocate and initialize all declared variables and constants]
         [save any registers necessary (and set up the FP, etc.)]

      Q: Which should be allocated first: constants or variables?
         Want them in reverse order of declaration, so offsets work
           so the variables come first, then the constants

      Q: How do we know how much space to allocate?
         Just process each declaration in a code sequence,
         so don't really need to know that.

      Q: How to initialize constants?
           use the literal table's offset for the literal
           from the $gp register, store that into the stack.
           I.e., for a const_def of the form x = L, where L is at offset ofst
           from $gp
               [allocate one word on the stack (ADDI $sp, $sp, -4)]
               [load L's value from $gp + 4*ofst into $at (LW $gp, $at, ofst)]
               [store $at into the stack top (SW $sp, $at, 0)]

      Q: How to compute the value of the constant?
         we'll always use the literal table, as that will always work

      Q: How to initialize variables?
           variables are initialized to 0, so can use
           SW $sp, $0, 0 to write 0 into the top of the stack


** Compiling Expressions
*** deciding where to start
    The best way to start is to handle the simplest cases first
    Q: What are the simplest cases for expressions?
        literals and then variables
        binary operator expressions have subexpressions (recursive)
------------------------------------------
        TRANSLATING EXPRESSIONS

Abstract syntax of expressions in PL/0

  E ::= E1 o E2 | x | n
  o ::= + | - | * | /


Simplest cases are:


------------------------------------------
     ... numeric literals (n)
         and variable and constant names (x, c)

*** example translations
**** numeric literals
------------------------------------------
 TRANSLATION SCHEME FOR NUMERIC LITERALS


------------------------------------------

  ...

    - we will always use the literal table,
      suppose we do literal_table_lookup and it return the offset of ofst
    - want to put it on top of the stack, so
        [load value from the data section into $at (LW $gp, $at, ofst)]
        [push $at onto the stack]

      we could optimize this if it fits into an immediate operand
         [allocate a stack location, write the value into it]
      How much would that save (1 instruction, so 1 machine cycle)
     
    Does this mean we need to track offsets in the global data?
        Yes, that is the job of the literal table

------------------------------------------
  TRANSLATION SCHEME FOR VARIABLE NAMES
           (AND CONSTANTS)


------------------------------------------


    want to use LW to bring the value into a register (say $v0)
      and so can then push it onto the stack
    - we will use $t9 as a frame register, so as not to disturb $fp

    # suppose lexical address of x is (levelsOut, ofst)
         # note that ofst is a positive word offset

    # get base of x's stack frame into $t9
    load FP into $t9
    load next static link in $t9 }
      i.e., LW $t9, $t9, -3      }
      ..                         } "levelsOut times"
    load next static link in $t9 }

    # load x's value using is base AR frame
    LW $t9, $v0, ofst

    [code to push $v0 on top of the stack]
    

**** binary operations
       already discussed above,
       idea: evaluate the subexpressions onto the stack,
       pop them back into registers,
       operate on them with the appropriate instruction, result in a register
       push that register onto the stack
       
    Q: So, for E1 - E2 what needs to be done?
       [code to evaluate E1 onto the top of the stack] # recursive!
       [code to evaluate E2 onto the top of the stack] # recursive!
       [code to pop the top of the stack, i.e., E2's value, into $at]
       [code to pop the top of the stack, i.e., E1's value, into $v0]
       [SUB $v0, $at, $v0]
       [code to push $v0 (i.e., E1 - E2) onto the top of the stack]

** Statements
*** Basic Statements

   Q: What are the base cases in the grammar for statements?
   I.e., what statements don't contain other statements?

     - skip
     - assignments
     - read
     - write
     - call

------------------------------------------
 TRANSLATION SCHEME FOR BASIC STATEMENTS


    skip

    x := E


    read x


    write E


------------------------------------------
            ... skip
                [Do nothing.  Need some instruction that does nothing,
                  like adding 0 to a register or shifting a register 0 bits.
                 Let's use [SLL $at, $at, 0].]

            ... x := E
                  (Suppose lexical address of x is (levelsOut,ofst),
                     that information should be in the AST,
                     so need the scope_check module to put an id_use
                     in x's AST.)
                  (Let's use $v0 to hold E's value and use
                   $t9 for the frame pointer in x's AR, so it's:)

                [code to eval E onto the top of the stack]
                [code to load base of x's scope's frame into $t9]
                [code to pop top of stack into $v0]
                [SW $t9, $v0, ofst] # store's $v0's value (i.e., E's value)
                                    #   into memory[$t9+(4*ofst)]

            Q: For testing, want to know: What are the simplest cases?
                x is declared in the current scope (0 levels out)
                   (this is the only case we're worrying about in fall 2023)
                E is a literal
                   (so implement that case for expressions,
                    using the literal table)

            Q: In general, can the "levels outwards" part of
                the lexical address be determined when the variable
                is declared?
                No, it depends on the nesting of the use,
                  but can get it from the variable's id_use

            Q: Does the same thing work for constants?
               Yes, they are just initialized variables that don't change

            ... read x
                  (Suppose x is a variable at lexical address (lvls, ofst)

                [RCH]  # read a character and put it in $v0
                [code to push $v0 on top of the stack]
                [code to load x's scope's frame pointer into $t9]
                [code to load top of stack into $v0]                
                [SW $t9, $v0, ofst]

            ... write E
                Q: Should we write a character with code E or the digits of E?
                    Probably want both, and we added a
                         PINT instruction to VM to print integer value
                            (and PFLT to print a float value in FSRM)
                    Should also add a new kind of statement, say print E
                         to PL/0 (and FLOAT) to print a single character

                    For now we will leave PINT as writing an integer value,
                     We want the result of E to be in $a0,
                       since $a0 is where args go for system calls like PINT,
                     So we would use a design like:

                   [code to eval E onto top of the stack]
                   [code to pop stack into $a0]
                   [PINT]

            Another basic statement is
                 call p
               (but for fall 2023 we aren't handling that,
               and in any case we would worry about procedures later)

** Conditions
*** Overall conditions

    Conditions are somewhat like expressions,
      and can contain expressions,
      so can't reserve registers for their values
    Instead, like expressions, they should always store their (truth) value
      on top of the runtime stack
      
------------------------------------------
        GRAMMAR FOR CONDITIONS

<condition> ::= odd <expr>
              | <expr> <rel-op> <expr>
<rel-op> ::= = | <> | < | <= | > | >=

So the code recursion structure is?


Code looks like:


------------------------------------------
          // return a code sequence to put the truth value
          // (1 for true, 0 for false) of the condition on top of stack
          gen_code_condition(condition_t cond)
             does a switch on cond.cond_kind
              and calls either
                  gen_code_odd_cond(cond.data.odd_cond)
               or gen_code_relop_cond(cond.data.rel_op_cond)

         Q: What should these functions return?
            code sequences (code_seq)
            Write the code for gen_code_condition...

**** Relational operator conditions
------------------------------------------
     RELATIONAL OPERATOR CONDITIONS

<condition> ::= <expr> <rel-op> <expr>

A design  for conditions:

 Goal: put true of false on top of stack
       for the value of the condition

 Consider E1 <> E2

  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  # jump past 2 instrs,
  # if GPR[$v0]!=GPR[$at]
  BNE $v0, $at, 2
  # put 0 (false) in $v0
  ADD $0, $0, $v0
  # jump over next instr
  BEQ $0, $0, 1
  # pub 1 (true) in $v0
  ADDI $0, $v0, 1
  # now $v0 has the truth value
  [code to push $v0 on top of stack]

 Consider E1 >= E2
  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  SUB $v0, $at, $v0      # $v0 = E1 - E2
  # jump past 2 instrs,
  # if GPR[$v0]>=GPR[$at] # if E1-E2 >= 0
  BGEZ reg, $at, 2        # skip  2 instrs
  # put 0 (false) in reg
  ADD $0, $0, reg
  # jump over jext instr
  BEQ $0, $0, 1
  # pub 1 (true) in reg
  ADDI $0, reg, 1

------------------------------------------
          explain all of this.
             Note that E1 >= E2
                is true just when E1 - E2 >= 0
                    (subtract E2 from both sides)

             Note that BEQ $0, $0, 1
              skips the next instruction (since GPR[0] = GPR[0], for ints)

          Q: What would work for =?
               Use BEQ instead of BNE

          Q: What would you do for < ?
               use BLTZ instead of BGEZ,
               similarly for <= and >.

------------------------------------------
     CODE FOR BINARY RELOP CONDITIONS

// file ast.h
typedef struct {
    file_location *file_loc;
    AST_type type_tag;
    expr_t expr1;
    token_t rel_op;
    expr_t expr2;
} rel_op_condition_t;


// file gen_code.c

// Requires: reg != $at
// Generate code for evaluating condAST into reg
// Modifies when executed: reg, $at
code_seq gen_code_relop_cond(
              rel_op_condition_t condAST,
              reg_num_type reg)
{


}
------------------------------------------
    ...

    [code to push E1's value on the stack]
    [code to push E2's value on the stack]
    [code to push the truth value of E1 rel_op E2 on top of stack]

** Control Flow Statements (Compound Statements)
     These are the compound statements
     Q: Why is it useful to write the base cases first?
        Testing is easier

------------------------------------------
 ABSTRACT SYNTAX FOR COMPOUND STATEMENTS

S ::= begin { S }
    | if C S1 S2
    | while C S

So what is the code structure?


Code looks like:


  begin S1 S2 ... end


  if C S1 S2


  while C S


------------------------------------------
       ... # begin S1 S2 ...
           [code for S1]
           [code for S2] # concat them all!
           ...

       ... if C then S1 else S2
           [code to push C's truth value on top of stack]
           [code to pop top of stack into $v0]
           BEQ $0, $v0, [length(S1)+1] # skip S1 if false
           [code for S1]
           BEQ $0, $0, [length(S2)]  # skip else part (finish)
           [code for S2]

                 Why add 1 to length of S1?
                    to account for the instruction that skips
                         over the else part

                 Note this requires the computation of the code
                 sequences for S1 and S2 first, so know how long they are

       ... # while C do S
 cond:     [code to push C's truth value on top of stack]
           [code to pop top of stack into $v0]
           BEQ $0, $v0, [length(S)+1] # skip S if false (goto exitLoop)
           [code for S]
           BEQ $0, $0, -(length(S)+length(C)+1) #jump back (goto cond)
 exitLoop: