I. HW4, Basics and Demos
 A. Testing and The Task
------------------------------------------
         TESTING FOR HOMEWORK 4

Does the generated code need to work
  in some particular way?

------------------------------------------
------------------------------------------
           RUNNING INDIVIDUAL TESTS

$ cat simple-print.spl
% $Id: simple-print.spl,v 1.1 ...
begin
  print 3402
end.

$ make simple-print.myo
rm -f simple-print.bof; umask 022; \
./compiler simple-print.spl
rm -f simple-print.myo; umask 022; \
cat char-inputs.txt | \
   ./vm/vm simple-print.bof \
    > simple-print.myo 2>&1

$ cat simple-print.myo
3402$

------------------------------------------
------------------------------------------
              THE TASK

Write modules:

    gen_code

    literal_table


So that with the provided files


  SPL source file  test.spl
                      |
                      | compiler
                      |
                      v
         BOF file  test.bof
                      |
                      | VM (provided)
                      |
                      v
   program output  test.myo
------------------------------------------
 B. Strategy
------------------------------------------
  OVERALL STRATEGY FOR WRITING GEN_CODE

Use the shape of the unparser.c code


Decide on overall tactics
   (e.g., expressions put result on stack)
   and follow those in all cases


Build up code from


Check your work by:


------------------------------------------
  1. Some Basic Decisions
   a. Literal above the GP
------------------------------------------
  LITERALS STORED ABOVE THE GP FROM BOF

Literal values stored at positive offsets
  from the GP

 - loaded from the BOF file's data section
 - offsets known to the compiler from


Memory of the VM running a program:


 address
   32768 [                        ]
         [                        ]
         [                        ]
         [                        ]
 initial [                        ]
SP,FP -->[   runtime stack        ]
         [     program's AR       ]
         [    AR for block 1      ]
         [       ...              ]
  FP --> [    AR for block N      ]
  SP --> [       ...              ]
         [        |               ]
         [        v               ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [        literals        ]
         [          ...           ]
         [     (data from BOF)    ]
  GP --> [   offset 0 data word   ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [          VM            ]
         [      instructions      ]
         [          ...           ]
         [      (text of BOF)     ]
         [                        ]
  PC --> [                        ]
     0   [________________________]

------------------------------------------
------------------------------------------
         OTHER BASIC DECISIONS

Epressions:


Conditions:


Statements:


Declarations:


------------------------------------------
------------------------------------------
        STACK LAYOUT FOR EACH AR

Layout 2:
                                   offset
         [      ...               ]
         [ local variables        ]
         [      ...               ]
   FP -->[ local constants        ] 0
         [  saved     SP          ]-1
         [  registers FP          ]-2
         [            static link ]-3
         [            RA          ]-4
         [ temporary storage      ]
   SP -->[      ...               ]

------------------------------------------
        Is there something in the provided code that makes this layout?
 C. Tactics
------------------------------------------
  TACTIC: MAKE EVERYTHING LOOK THE SAME

Even the program's AR looks like other ARs

$ cat empty.spl
% $Id: empty.spl,v 1.1 ...
begin
end.
$
$ make empty.asm
rm -f empty.asm; umask 022; \
vm/disasm empty.bof > empty.asm 2>&1
$ cat empty.asm
.text	0
a0:	CPR $r3, $fp
a1:	SWR $sp, -1, $sp
a2:	SWR $sp, -2, $fp
a3:	SWR $sp, -3, $r3
a4:	SWR $sp, -4, $ra
a5:	CPR $fp, $sp
a6:	SRI $sp, 4
a7:	SWR $sp, -1, $sp
a8:	SWR $sp, -2, $fp
a9:	SWR $sp, -3, $r3
a10:	SWR $sp, -4, $ra
a11:	CPR $fp, $sp
a12:	SRI $sp, 4
a13:	NOP 
a14:	LWR $ra, $fp, -4
a15:	LWR $r3, $fp, -1
a16:	LWR $fp, $fp, -2
a17:	CPR $sp, $r3
a18:	LWR $ra, $fp, -4
a19:	LWR $r3, $fp, -1
a20:	LWR $fp, $fp, -2
a21:	CPR $sp, $r3
a22:	EXIT 0
.data	4096
.stack	20480
.end
$ 

------------------------------------------
   a. Other simple tests
II. Code Generation for Procedures
------------------------------------------
      SUPPORTING PROCEDURES AND CALLS

Main issues:
   - storing their code
     Why?


   - knowing exactly where each starts
     Why?


Another issue:
   - sending the right static link

------------------------------------------
        What static link does a called procedure need?
 A. Where to store code for procedures?
   Where do we put the code sequences for each procedure?
------------------------------------------
       WHERE TO PUT PROCEDURE CODE?

Possible layouts in VM's code array:


------------------------------------------
       How would you implement each?
       Which layout makes the most sense?
 B. how to find each procedure's starting address?
------------------------------------------
      NESTED PROCEDURES ARE A PROBLEM

begin
  proc A
  begin
    proc B
    begin
      # B's body code...
      call A # ...
      # ...
    end;
    # A's body code
    call B # ...
    # ...
  end;
  call A
end.

If lay out the code as

   [ code for A ]
   [ code for B ]

How do we know the address of B
    to compile the call to B in A?


What about the other direction?


------------------------------------------
------------------------------------------
   RECURSIVE PROCEDURES, SIMILAR PROBLEM

begin
  proc R
  begin
    # R's body code ...
    call R
    # ...
  end;
  # ...
  call R
  # ...
end.

Before storing code for R,
  how do we know where it starts?


------------------------------------------
------------------------------------------
  MUTUAL RECURSION (NOT IN OUR LANGUAGE)
        
begin
  proc O
  begin # O's body code...
    call E
    # ...
  end;

  proc E;
  begin
    # E's body code ...
    call O
    # ...
  end;

  # ...
  call O;
  call E
  # ...
end.

One of these must before the other in
  the code area of the VM...


------------------------------------------
      No matter which of O or E is put first,
         how is the call to the second one to know
         where the second one starts?
  1. solutions
------------------------------------------
       SOLUTION STRATEGIES FOR CALLS

[Multiple passes]:
  1. Generate code for each procedure
     (+ store offsets in symbol table,
      + layout procedure code in memory
        with placholders for calls)
  2. Gather table of addresses
     (map from names to addresses,
      using offsets and beginning address)
  3. Patch up code addresses for calls
     (+ output code)

[Lazy evaluation, labels]:
  1. Generate code for each procedure
     with calls to "labels"
     (+ store or update
        labels in symbol table)
  (+ output code)
------------------------------------------
 C. Multiple Passes Solution
------------------------------------------
    GENERAL SOLUTION: MULTIPLE PASSES

Problem: where does each procedure start?

Passes over the IR:
  1. Compile all procedure code
     (now know how big each procedure is)
  2. Lay out procedure code in memory
     (now know where each starts)
  3. Change each call instruction


------------------------------------------
        What would a progrm need to do to change all the call instructions?
 D. Labels Solution
------------------------------------------
        GENERAL SOLUTION: LABELS

Use "labels" to allow


Term "label" is from assembly language

    ;  ...
    jmp L
    ; ...
    L: ; ...

------------------------------------------
------------------------------------------
        APPROACHES TO FIXING LABELS

Problem: convert labels to addresses

 (1) Use multiple passes
       a. Generate code with labels
       b. Lay out memory for procedures
          (determine starting addresses)
       c. Change labels to addresses

     advantages:


     disadvantages:


 (2) Use shared mutable data (lazy eval.)
       a. labels are unique placeholders,
          shared by all uses (calls)
       b. when address is determined,
          update the placeholder
          (and all uses are updated)

     advantages:


     disadvantages:


------------------------------------------
  1. label data structure
   a. Creating and propagating labels
------------------------------------------
       LABEL DATA STRUCTURE

// file label.h
// ...
#include "machine_types.h"

typedef struct {
    bool is_set;
    unsigned int word_offset;
} label;

// Return a fresh label that is not set
extern label *label_create();

// Requires: lab != NULL
// Set the address in the label
extern void label_set(label *lab,
               unsigned int word_offset);

// Is the given label set?
extern bool label_is_set(label *lab);

// Requires: label_is_set(lab)
// Return the word offset in lab
extern
unsigned int label_read(label *lab);
------------------------------------------
------------------------------------------
            CREATING LABELS

Labels created for all procedures
when creating proc_decl_t ASTs.

// file ast.h
// ...
#include "label.h"

// ...

typedef struct proc_decl_s {
    file_location *file_loc;
    AST_type type_tag;
    struct proc_decl_s *next; // for lists
    const char *name;
    struct block_s *block;
    label *lab; // for code generation
} proc_decl_t;


// file ast.c

// Return an AST for a proc_decl
proc_decl_t ast_proc_decl(ident_t ident,
                          block_t block)
{
    proc_decl_t ret;
    ret.file_loc = // ...
    ret.type_tag = proc_decl_ast;
    ret.next = NULL;
    ret.name = ident.name;
    block_t *p = // ...
    ret.block = p;

    // this is the source of the labels
    ret.lab = label_create();
    assert(ret.lab != NULL);

    return ret;
}
------------------------------------------
------------------------------------------
     PROPAGATING POINTERS TO LABELS (1)

Labels added to attributes
of procedure names

// file id_attrs.h
//...
#include "label.h"

typedef struct {
    file_location file_loc;
    id_kind kind;  // kind of identifier
    unsigned int offset_count;
    // for a procedure, its label
    label *lab;
} id_attrs;

------------------------------------------
------------------------------------------
     PROPAGATING POINTERS TO LABELS (2)

Make call statement ASTs point to the
label of the procedure being called

// file scope_check.c

// check the statement to make sure that
// the procedure has been declared
// (if not, then produce an error).
// Modifies the given AST
// to have appropriate id_use pointers.
void scope_check_callStmt(
                        call_stmt_t *stmt)
{


}

------------------------------------------
------------------------------------------
     PROPAGATING POINTERS TO LABELS (3)

Associate labels with
each call instruction in code structures

// file code.h
// ...
#include "label.h"

typedef struct code_s {
    struct code_s *next;
    bin_instr_t instr;
    // labels for call instructions
    label *lab; 
} code;

// ...

// Requires: lab != NULL
// Create and return a fresh instruction
// with the named mnemonic and parameters
extern code *code_call(address_type a,
                       label *lab);

// ...

------------------------------------------
        Where should the label passed to the code_call function
           come from?
        So what has been achieved?
   b. Using labels to fill in addresses of procedures
    i. Putting addresses of Procedures in Labels
------------------------------------------
    SETTING LABELS IN PROCEDURES (1)

// file label.h

typedef struct {
    bool is_set;
    unsigned int word_offset;
} label;


------------------------------------------
      Where is the address of a procedure known?
------------------------------------------
    SETTING LABELS IN PROCEDURES (2)

// file gen_code.c

// ...

void gen_code_proc_decl(proc_decl_t pd)
{


}

------------------------------------------
    What makes code for a procedure?
    At the end of code generation, what has this achieved?
    ii. Fixing up the Call Instructions
    Can some call instructions not have their labels set
       at the end of code generation?
------------------------------------------
  PUTTING ADDRESSES IN CALL INSTRUCTIONS

Write a function to fix
all call instructions in a code_seq

extern
void code_seq_fix_labels(code_seq cs);


------------------------------------------
   How would you write such a function?
 E. testing the solution
   How would you test a solution?
 F. exercise