I. Code Generation
 A. overview
------------------------------------------
         OVERVIEW OF CODE GENERATION

 .. ASTs...-> [ Static Analysis ]
                     |
                     | IR
                     v
              [ Code Generation ]
                     |
                     | Machine Code
                     |
                     v
               Virtual Machine 
                   Execution

The IR (= Intermediate Representation)
    records


------------------------------------------
  1. IR (Intermediate Representation)
         What kind of information is needed from a name's use
            in order to generate code?
         Should the parser create the lexical address of a name's use
            during paring?
         Is the symbol table unchanging (immutable)?
------------------------------------------
            IR TREES

An IR is a tree structure,


Helps in modularizing compilers
and code generation

   WITHOUT IR               WITH IR

Java        x86       Java          x86  
                                       
   C        MIPS         C          MIPS 
                              IR         
 C++        Sparc      C++          Sparc
                                       
  C#        A1          C#          A1   
  
------------------------------------------
------------------------------------------
          OUR CHOICES FOR AN IR

To keep things simple, we will use
   a modified AST type as an IR

Parser:
   - records
   - provides 


Static analysis:
   - records 

   
------------------------------------------
  2. General strategy
------------------------------------------
   GENERAL STRATEGY FOR CODE GENERATION

Follow the grammar of


------------------------------------------
------------------------------------------
       FOLLOWING THE GRAMMAR

Code resembles the grammar that


When

------------------------------------------
     How does this relate to the parser?
     Why is this useful?
 B. Translation target: code sequences
------------------------------------------
         TARGET: CODE SEQUENCES

Need sequences of machine code


Why?


------------------------------------------
        Why are code sequences needed?
------------------------------------------
     REPRESENTING CODE SEQUENCES IN C

// code that can be in a sequence
typedef struct code_s code;
// code sequences
typedef code *code_seq;

// machine code instructions
typedef struct code_s {
    code_seq next;
    instruction instr;
} code;

------------------------------------------
 C. Expressions
------------------------------------------
        TRANSLATING EXPRESSIONS

Abstract syntax of expressions:

  E ::= E1 o E2 | x | n
  o ::= + | - | * | /

VM instructions:

  NEG      "negate val on top of stack"
  ADD      "add top 2 elems on stack"
  SUB      "subtract top from 2nd top"
  MUL      "multiply top 2 elems"
  DIV      "divide 2nd from top by top"
  MOD      "2nd from top elem mod top"
  
  PBP      "push BP on the stack"
  LOD o    "load offset from addr in top"
  LIT n    "literal"

------------------------------------------
------------------------------------------
   TRANSLATION SCHEME FOR EXPRESSIONS

Expression   MACHINE CODE


   x         
             
             
   n         


 E1 o E2     
             
              
------------------------------------------
        The grammar (abstract syntax) is recursive,
            so what are the base cases?
        Why use PBP (push base pointer) followed by levl PSI instructions?
        What are we assuming about the evaluation of expressions?
 D. Declarations
   Where are constants and variables stored?
------------------------------------------
  TRANSLATION SCHEME FOR DECLARATIONS

   const c = n;     

   var x;           
   
------------------------------------------
       What does LIT n do?
       What does INC do?
       How do we track the association between names and these locations?
       Where do global constants and variables go?
 E. Straight-line Code (simple statements without control flow)
   What are the base cases in the grammar for statements?
------------------------------------------
 TRANSLATION SCHEME FOR BASIC STATEMENTS


    skip

    x := E


    write E


    read x


------------------------------------------
            Can the "levels outwards" part of the lexical address
                be determined when the variable is declared?
            Can the "offset from start of AR" part of the lexical
               address be determined when the variable is declared?
            Does the same hold for constants?
 F. Conditions
------------------------------------------
        GRAMMAR FOR CONDITIONS

<condition> ::= odd <expr>
              | <expr> <rel-op> <expr>
<rel-op> ::= = | <> | < | <= | > | >=

So structure is?


Code looks like:


------------------------------------------
         What should these functions return?
 G. Control Flow Statements (Compound Statements)
     Why is it useful to write the base cases first?
------------------------------------------
 ABSTRACT SYNTAX FOR COMPOUND STATEMENTS

S ::= begin { S }
    | if C S1 S2
    | while C S

So what is the code structure?


Code looks like:


  begin S1 S2 ...

  if C S1 S2


  while C S


------------------------------------------
------------------------------------------
TRANSLATION PROBLEM FOR CONDITIONAL STMTS

If jumps use an absolute address,
   how to compute that?

Problem:
   For code in procedures:
   

Approaches:


------------------------------------------
 H. procedures
------------------------------------------
      SUPPORTING PROCEDURES AND CALLS

Main issues:
   - storing their code
     Why?


   - knowing exactly where each starts
     Why?


Another issue:
   - trimming the stack for return

------------------------------------------
  1. Where to store code for procedures?
   Where do we put the code sequences for each procedure?
------------------------------------------
       WHERE TO PUT PROCEDURE CODE?

Possible layouts in VM's code array:


------------------------------------------
       How would you implement each?
       Which layout makes the most sense?
  2. how to find each procedure's starting address?
------------------------------------------
      NESTED PROCEDURES ARE A PROBLEM

  procedure A;
    procedure B;
      begin # B's body code...
            call A # ...
            # ...
      end
   begin
      # A's body code
      call B # ...
      # ...
   end

If store the code as
   [ code for A ]
   [ code for B ]
  how do we know the address of B
    to compile the call to B?
What about the other direction?

------------------------------------------
------------------------------------------
   RECURSIVE PROCEDURES, ANOTHER PROBLEM

  procedure R;
    begin
      # R's body code ...
      call R
      # ...
    end

Before storing code for R,
  how do we know where it starts?

  procedure O;
    begin # O's body code...
      call E
      # ...
    end

  procedure E;
    begin
      # E's body code ...
      call O
      # ...

One of these must before the other in
  the code area of the VM...


------------------------------------------
        No matter which of O or E is put first,
           how is the call to the second one to know
           where the second one starts?
  3. Labels as a solution
------------------------------------------
         GENERAL SOLUTION: LABELS

Use "labels" to allow
   

Term "label" is from assembly language

    ;  ...
    jmp L
    ; ...
    L: ; ...

------------------------------------------
------------------------------------------
        APPROACHES TO FIXING LABELS

Problem: convert labels to addresses

 (1) Use multiple passes
       - lay out memory for procedures
          (determine starting addresses)
       - change labels to addresses

     advantages:


     disadvantages:


 (2) Use shared mutable data (lazy eval.)
       - label is a unique placeholder,
         shared by all uses
         
       - when address is determined,
          update the placeholder
          (and all uses are updated)

     advantages:


     disadvantages:


------------------------------------------
   a. label data structure for lazy evaluation
------------------------------------------
    LABEL DATA STRUCTURE FOR LAZY EVAL  

In file label.h for HW4:

  typedef struct {
    bool is_set;
    address addr;
  } label;

// Return a fresh label that is not set
extern label *label_create();

// Set the address in the label
extern void label_set(label *lab,
                      address addr);

// Is lab set?
extern bool label_is_set(label *lab);

// Requires: label_is_set(lab)
// Return the address in lab.
extern address label_read(label *lab);

------------------------------------------
  4. Trimming the stack for returns
------------------------------------------
PROBLEM OF TRIMMING THE STACK FOR RETURNS

Consider a procedure's block like:

 procedure P;
    var x;
    skip;

code for P would be:

   INC 1
   NOP
   RTN

Why?

Picture of stack before a call of P:

    6 |         |
    5 |         |
    4 |         | <- SP
    3 |         |
    2 |         | <- BP
      |---------|

Picture after the call

    7 |         | <- SP
    6 | ret adr | 
    5 | old BP  | 
    4 | stat lnk| <- BP
    3 |         |
    2 |         | 
      |---------|

After executing INC 1 and NOP

    8 |         | <- SP
    7 |    x    | 
    6 | ret adr | 
    5 | old BP  | 
    4 | stat lnk| <- BP
    3 |         |
    2 |         | 
      |---------|

The return will pop the stack 3 times:

    5 |  old BP | <- BP
    4 |         |
    3 |         |
    2 |         | 
      |---------|

------------------------------------------
        Is the picture at the end the same as the start?
        What got put into the BP register?
        How to fix this?
 I. exercise