OVERVIEW OF CODE GENERATION

 .. ASTs...-> [ Static Analysis ]
                     |
                     | IR
                     v
              [ Code Generation ]
                     |
                     | Machine Code
                     |  (.bof file)
                     v
               Virtual Machine
                   Execution

The IR (= Intermediate Representation)
    records


   GENERAL STRATEGY FOR CODE GENERATION

Don't try to optimize!
   (optimize in a separate pass)

Follow the grammar of
  the ASTs


       FOLLOWING THE GRAMMAR

Code resembles the grammar that
   is used for the ASTs

When the grammar is recursive,
  then the code generator is recursive

 when the grammar has alternatives
  the the code generator has a switch/if statement


         TARGET: CODE SEQUENCES

Need lists of machine code

Why?
  1. statements in PL/0 often get translated into
     multiple instructions in the VM
  2. need to indicate sequential execution


     REPRESENTING CODE SEQUENCES IN C

#include "instruction.h"

// code that can be in a sequence
typedef struct code_s code;
// code sequences
typedef code *code_seq;

// machine code instructions
typedef struct code_s {
    code_seq next;
    bin_instr_t instr;
} code;


 STRATEGIES FOR DESIGNING CODE SEQUENCES

Work backwards


       EXPRESSION EVALUATION

Example: (E1 + E2) - (E3 / E4).


Constraints:
 - Expressions have a result value
 - Binary operations (+, -, *, /)
   in the SRM
   need 2 registers

Where should the result be stored?

  Can it be a register?
    i.e., can we reserve a register
           for an expression's value?
    No! Not enough registers

    Advice: use the runtime stack
       rule: the result of every expression
              is pushed onto the stack


   so to evaluate E2 op E3 use a code sequence like:

     [code to evaluate E2, putting its result on the stack]
     [code to evaluate E3]
     [pop E3's value into $t3]
     [pop E2's value into $t2]
     [evaluate $t2 op $t3 into $t2,
       e.g., SUB $t2, $t3, $t2]
     [push $t2 onto the stack]

   Addressing variables and constants
      want to use LW (and SW for assignments)
         so need AR's base address and offset

         for the main program's block,
	    the AR's base address will be $fp

         the offset is in the symbol table


       USE OF REGISTERS

What if the register is already in use?
   e.g., $v0 for expression's value
     consider   x := y + z

Strategies:
 - use a different register
     not enough

 - save and restore
     does work, like putting all expression values on stack


     GENERAL STRATEGY FOR EXPRESSIONS

Each expression's value goes
   on the runtime stack at "the top"


To operate on an expression's value
   in a register r:
       [compute expression's value onto top of stack]
       [pop the stack into r]

    code_push_reg_on_stack(reg_num_type)

    code_pop_stack_into_reg(reg_num_type)


      BACKGROUND: SRM INSTRUCTIONS

 ADD s,t,d   "GPR[d] = GPR[s]+GPR[t]"
 SUB s,t,d   "GPR[d] = GPR[s]-GPR[t]"
 MUL s,t     "HI,LO = GPR[s]*GPR[t]"
 DIV s,t     "HI = GPR[s] % GPR[t]"
              and "LO = GPR[s] / GPR[t]"

 LW b,t,o    "GPR[t] = memory[GPR[b]+4*o]"
 SW b,t,o    "memory[GPR[b]+4*o] = GPR[t]"
ADDI s,t,i   "GPR[t] = GPR[s]+sgnExt(i)


How to move value from r1 to r2?
    ADDI r1, r2, 0
    ADD $0, r1, r2

What limitations on immediate operands?
    must fit in a short int (16 bits)


What if the literal doesn't fit?
     e.g., 1999999999
   - use global data (words in the data section),
        use LW to load it when needed
      use a "literal table" to track offsets for these
   - could compute the value


            LITERAL TABLE IDEA

- Store literal values in
       a table

- Keep mapping from
       text/value of the literal
    to the offset in the data section (from $gp)

- Initialize
    from the BOF's data section


  LITERAL TABLE IN EXPRESSION EVALUATION

Idea for code for numeric expression, N:

    1. Look up N in global table,
    2. Receive N's offset (from $gp)


    3. generate a load instruction
         into some register (say, $v0)
	 that value
	     LW $gp, $v0, offset


    4. push $v0 onto the stack


    LITERAL TABLE AND BOF DATA SECTION

How to get the literals into memory
   with the assumed offsets?

   put them into the BOF's data section
     in order (starting with offset 0)


   LAYOUT OF AN ACTIVATION RECORD

Must save SP, FP, static link, RA
   and registers $s0-$s7

Can't have offset of static link
    at a varying offset from FP

Layout 1:

  FP --> [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ local constants        ]
         [      ...               ]
         [ local variables        ]
         [      ...               ]
         [ temporary storage      ]
   SP -->[       ...              ]


Layout 2:

         [      ...               ]
         [ local variables        ]
         [      ...               ]
   FP -->[ local constants        ]
         [  saved       SP        ]
         [    registers FP        ]
         [            static link ]
         [              RA        ]
         [              $s0       ]
         [              ...       ]
         [              $s7       ]
         [ temporary storage      ]
   SP -->[       ...              ]


Advantages of layout 1:
   - simple, like a stack machine
   - tracing in the VM is easy


Advantages of layout 2:
   - offsets for constants and variables are
       what was recorded in the symbol table
   - tracing of the VM can be done
   - corresponds to conventions on MIPS

   recommend layout 2


        TRANSLATING EXPRESSIONS

Abstract syntax of expressions in PL/0

  E ::= E1 o E2 | x | n
  o ::= + | - | * | /


Simplest cases are:
   numeric literals
   identifiers
   binary operator expressions


 TRANSLATION SCHEME FOR NUMERIC LITERALS

   - always use the literal table
       call literal_table_lookup
         to get offset, ofst
   - want to put that on top of the stack

     [load value into (say) $at using ofst]
        i.e., LW $gp, $at, ofst
     [push $at onto the stack]


  TRANSLATE THE WRITE STATEMENT

     e.g., write 3402

     [evaluate the expression (onto the stack)]
     [pop the stack into $a0]
     PINT
   

  TRANSLATION SCHEME FOR VARIABLE NAMES
           (AND CONSTANTS)

   want to use LW instruction bring value
     into a register, say reg

   (if no procedures)
      FP is the frame pointer for the AR

   (with procedures, suppose lexical address is
       (lo,ofst)
      # compute the base of name's AR into $t9
      [move $fp into $t9]
      while lo > 0
         LW $t9, $t9, -3 # fetch static link from AR
      # now $t9 is the frame pointer for the AR
      #  where the name was declared

   use id_use_attrs(id->idu) to get attributes
       from the AST id for the identifier expression

        unsigned short
        ofst = id_use_attrs(id->idu)->offset_count

   (so, if no procedures)

        LW $fp, reg, ofst
        [push reg onto the stack]

   (so, if have procedures

        LW $t9, reg, ofst
        [push reg onto the stack]


TRANSLATION SCHEME FOR BINARY OPER EXPRS

Goal is to get to an instruction like
   ADD, SUB, MUL, DIV   
   (then push result onto stack)


Example: E1 - E2

   [code to evaluate E1]
   [code to evaluate E2]
   [code to pop E2's value into $t2]
   [code to pop E1's value into $t1]
   SUB $t1, $t2, $t1
   [code to push $t1 onto the stack]


 TRANSLATION SCHEME FOR PL/0 DECLARATIONS

   const c = n;
   var x;

When do blocks start executing?
   when the procedure or the main program starts executing

What should be done then?
   [allocate and intialize the variables and constants]
   [save any registers necessary and set up the AR]


How do we know how much space to allocate?
   allocate space for each constant and variable
      (each is one word)

What order?
   to get the stack AR layout 2 as before,
   do the variables then the constants
      in reverse order


How to initialize constants?
   use literal table's offsets and LW
      LW $gp, $at, offset
          (where offset is computed from the literal table)
      SW $sp, $at, 0

How to initialize variables?
      SW $sp, $0, 0

 e.g.,

      ADDI $sp, $sp, -4  # allocate a word
      SW $sp, $0, 0      # initialize the variable to 0


 TRANSLATION SCHEME FOR BASIC STATEMENTS


    skip
             SRL $at, $at, 0
    x := E
            suppose offset for x is ofst (from id_use)
             [evaluate E (onto the stack)]
	     [get the frame pointer for x's location
	         into a register, $t9]
             [pop the stack into $at]
	     SW $t9, $at, ofst


    read x
           suppose offset for x is ofst (from id_use)
	   
           RCH  # puts char read into $v0
           [get the frame pointer for x's location
	        into a register, $t9]
           SW $t9, $v0, ofst


    write E
          [evaluate E]
	  [pop the stack into $a0]
	  PINT


        GRAMMAR FOR CONDITIONS

<condition> ::= odd <expr>
              | <expr> <rel-op> <expr>
<rel-op> ::= = | <> | < | <= | > | >=

So the code recursion structure is?


Code looks like:


     RELATIONAL OPERATOR CONDITIONS

<condition> ::= <expr> <rel-op> <expr>

A design  for conditions:

 Goal: put true of false on top of stack
       for the value of the condition

 Consider E1 <> E2

  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  # jump past 2 instrs,
  # if GPR[$v0]!=GPR[$at]
  BNE $v0, $at, 2
  # put 0 (false) in $v0
  ADD $0, $0, $v0
  # jump over next instr
  BEQ $0, $0, 1
  # pub 1 (true) in $v0
  ADDI $0, $v0, 1
  # now $v0 has the truth value
  [code to push $v0 on top of stack]

 Consider E1 >= E2
  [Evaluate E1 to top of stack]
  [Evaluate E2 to top of stack]
  [pop top of stack (E2's value) into $at]
  [pop top of stack (E1's value) into $v0]
  SUB $v0, $at, $v0      # $v0 = E1 - E2
  # jump past 2 instrs,
  # if GPR[$v0]>=GPR[$at] # if E1-E2 >= 0
  BGEZ reg, $at, 2        # skip  2 instrs
  # put 0 (false) in reg
  ADD $0, $0, reg
  # jump over jext instr
  BEQ $0, $0, 1
  # pub 1 (true) in reg
  ADDI $0, reg, 1


     CODE FOR BINARY RELOP CONDITIONS

// file ast.h
typedef struct {
    file_location *file_loc;
    AST_type type_tag;
    expr_t expr1;
    token_t rel_op;
    expr_t expr2;
} rel_op_condition_t;


// file gen_code.c

// Requires: reg != $at
// Generate code for evaluating condAST into reg
// Modifies when executed: reg, $at
code_seq gen_code_relop_cond(
              rel_op_condition_t condAST,
              reg_num_type reg)
{


}

 ABSTRACT SYNTAX FOR COMPOUND STATEMENTS

S ::= begin { S }
    | if C S1 S2
    | while C S

So what is the code structure?


Code looks like:


  begin S1 S2 ... end


  if C S1 S2


  while C S


      SUPPORTING PROCEDURES AND CALLS

Main issues:
   - storing their code
     Why?


   - knowing exactly where each starts
     Why?


Another issue:
   - sending the right static link


       WHERE TO PUT PROCEDURE CODE?

Possible layouts in VM's code array:


      NESTED PROCEDURES ARE A PROBLEM

  procedure A;
    procedure B;
      begin # B's body code...
            call A # ...
            # ...
      end
  begin
     # A's body code
     call B # ...
     # ...
  end

If lay out the code as

   [ code for A ]
   [ code for B ]

How do we know the address of B
    to compile the call to B?


What about the other direction?


   RECURSIVE PROCEDURES, SIMILAR PROBLEM

  procedure R;
    begin
      # R's body code ...
      call R
      # ...
    end

Before storing code for R,
  how do we know where it starts?


        MUTUAL RECURSION
        
  procedure O;
    begin # O's body code...
      call E
      # ...
    end

  procedure E;
    begin
      # E's body code ...
      call O
      # ...

One of these must before the other in
  the code area of the VM...


       SOLUTION STRATEGIES FOR CALLS

[Multiple passes]:
  1. Generate code for each procedure
     (+ store offsets in symbol table,
      + layout procedure code in memory)
  2. Gather table of addresses
     (map from names to addresses,
      using offsets and beginning address)
  3. Patch up code addresses for calls
     (+ output code)

[Lazy evaluation, labels]:
  1. Generate code for each procedure
     with calls to labels
     (+ store or update
        labels in symbol table)
  (+ output code)

      GENERAL SOLUTION: MULTIPLE PASSES

Problem: where does each procedure start?

Solution idea:
  1. Compile all procedure code
     (now know how big each procedure is)
  2. Lay out procedure code in memory
     (now know where each starts)
  3. Change each call instruction


         GENERAL SOLUTION: LABELS

Use "labels" to allow


Term "label" is from assembly language

    ;  ...
    jmp L
    ; ...
    L: ; ...


        APPROACHES TO FIXING LABELS

Problem: convert labels to addresses

 (1) Use multiple passes
       a. Generate code with labels
       b. Lay out memory for procedures
          (determine starting addresses)
       c. Change labels to addresses

     advantages:


     disadvantages:


 (2) Use shared mutable data (lazy eval.)
       a. labels are unique placeholders,
          shared by all uses (calls)
       b. when address is determined,
          update the placeholder
          (and all uses are updated)

     advantages:


     disadvantages:


    LABEL DATA STRUCTURE FOR LAZY EVAL

// file label.h

#include "machine_types.h"

typedef struct {
    bool is_set;
    address_type byte_addr;
} label;

// Return a fresh label that is not set
extern label *label_create();

// Set the address in the label
extern void label_set(label *lab,
                      address addr);

// Is lab set?
extern bool label_is_set(label *lab);

// Requires: label_is_set(lab)
// Return the address in lab.
extern
address_type label_read(label *lab);


             CONTEXT

VM changes (from HW4's VM):


// words are ints or floats
typedef enum {int_type, float_type} word_type;

// words for this machine
typedef struct word_s {
    word_type type_tag;
    union word_u {
	int i;
	float f;
    } data;
} word;

  No MOD instruction
  Changed to RND instruction (rounding)

FLOAT Language changes

  <stmt> ::= '{' <var-decls>
                 <stmt> { <stmt> } '}'
           | ...

where

  <var-decls> ::= { <var-decl> }
  <var-decl> ::= float <ident>
               | bool <ident>
              

    ENHANCED ASTS AS AN IR

See ast.h

Changes to ASTs:

// S ::= begin { VD } { S }
typedef struct {
    AST_list vds;
    AST_list stmts;
} begin_t;


       IDENTIFIER USES IN ASTs

// E ::= x
typedef struct {
    // name of a constant or variable
    const char *name;
    // set during static analysis,
    // includes info for lexical addr
    id_use *idu;
} ident_t;

// S ::= read x
typedef struct {
    AST *ident;
} read_t;

// S ::= assign x E
typedef struct {
    AST *ident;
    AST *exp;
} assign_t;


     ID_USE STRUCTURES

typedef struct {
    id_attrs *attrs;
    unsigned int levelsOutward;    
} id_use;


           ID ATTRIBUTES

// attributes of idents
typedef struct {
    file_location file_loc;
    var_type vt;  // type
    // offset from beginning of scope
    unsigned int loc_offset;
} id_attrs;

// where:

typedef enum {float_t, bool_t} var_type;


void scope_check_beginStmt(AST *stmt)
{
    symtab_enter_scope();  // <*******
    scope_check_varDecls(
        stmt->data.begin_stmt.vds);
    AST_list stmts
         = stmt->data.begin_stmt.stmts;
    while (!ast_list_is_empty(stmts)) {
	scope_check_stmt(
               ast_list_first(stmts));
	stmts = ast_list_rest(stmts);
    }
    symtab_leave_scope(); // <********
}


          GENERATING CODE

Done in the file gen_code.c

  - Functions arranged to walk the ASTs
  - All return a code_seq

 Useful files: ast.h
                 id_attrs.h
                   id_use.h
               code.h

        STEPS FOR CODE GENERATION

 1. start with the base cases


 2. Write simplest tests possible


 3. Design code sequences
     for the nonterminals involved


 4. Write code for each node of the AST


 5. Test it:
     a. check the output machine code

     b. check the VM's execution


              EXPRESSIONS

What are the base cases?


A very simple test:


What code sequence do we want?


         GENERATING CODE

Where does execution start?


What is the AST for our program?


Where do we generate those code sequences?


Let's write it!


            NESTED SCOPES

Example in FLOAT:

# $Id$
float x;
{ float y;
  { float z;
    z = 0;
    y = 1;
    x = 2;
  }
}

What kind of code sequence for this?