TESTING FOR HOMEWORK 4

Does the generated code need to work
  in some particular way?

  No, it's only the output of the compiled
  code, when run (on the VM), that has to match
  the expected outputs


           RUNNING INDIVIDUAL TESTS

$ cat simple-print.spl
% $Id: simple-print.spl,v 1.1 ...
begin
  print 3402
end.

$ make simple-print.myo
rm -f simple-print.bof; umask 022; \
./compiler simple-print.spl
rm -f simple-print.myo; umask 022; \
cat char-inputs.txt | \
   ./vm/vm simple-print.bof \
    > simple-print.myo 2>&1

$ cat simple-print.myo
3402$


              THE TASK

Write modules:

    gen_code

    literal_table


So that with the provided files
   the compiler transforms
         .spl source file
   into
         .bof file
   that when run in the provided VM
   produces the expected output


  SPL source file  test.spl
                      |
                      | compiler
                      |
                      v
         BOF file  test.bof
                      |
                      | VM (provided)
                      |
                      v
   program output  test.myo

  OVERALL STRATEGY FOR WRITING GEN_CODE

Use the shape of the unparser.c code
   because that also walks over the ASTs


Decide on overall tactics
   (e.g., expressions put result on stack)
   and follow those in all cases


Build up code from
   simple test cases (base cases of recursion)
   so you can test along the way


Check your work by:
   checking for expected:
     generated code files (make .asm files)
     traces from the VM (make .myt files)


  LITERALS STORED ABOVE THE GP FROM BOF

Literal values stored at positive offsets
  from the GP

 - loaded from the BOF file's data section
 - offsets known to the compiler from
     the literal table

Memory of the VM running a program:


 address
   32768 [                        ]
         [                        ]
         [                        ]
         [                        ]
 initial [                        ]
SP,FP -->[   runtime stack        ]
         [     program's AR       ]
         [    AR for block 1      ]
         [       ...              ]
  FP --> [    AR for block N      ]
  SP --> [       ...              ]
         [        |               ]
         [        v               ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [        literals        ]
         [          ...           ]
         [     (data from BOF)    ]
  GP --> [   offset 0 data word   ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [                        ]
         [          VM            ]
         [      instructions      ]
         [          ...           ]
         [      (text of BOF)     ]
         [                        ]
  PC --> [                        ]
     0   [________________________]


         OTHER BASIC DECISIONS

Epressions:
   push value on "top" of the runtime stack
      (at offset 0 from SP)


Conditions:
  push the truth value
  (0 for false and 1 for true)
  on top of the runtime stack


Statements:
  leave the SP unchanged when done
   (ultimately they don't allocate or
    deallocate storage, not permenantly)

Declarations:
   allocate and initialize variables
   and constants on the runtime stack
   in the AR for their block
     (use positive offsets from the FP)


        STACK LAYOUT FOR EACH AR

Layout 2:
                                   offset
         [      ...               ]
         [ local variables        ]
         [      ...               ]
   FP -->[ local constants        ] 0
         [  saved     SP          ]-1
         [  registers FP          ]-2
         [            static link ]-3
         [            RA          ]-4
         [ temporary storage      ]
   SP -->[      ...               ]


  TACTIC: MAKE EVERYTHING LOOK THE SAME

Even the program's AR looks like other ARs

$ cat empty.spl
% $Id: empty.spl,v 1.1 ...
begin
end.
$
$ make empty.asm
rm -f empty.asm; umask 022; \
vm/disasm empty.bof > empty.asm 2>&1
$ cat empty.asm
.text	0
a0:	CPR $r3, $fp
a1:	SWR $sp, -1, $sp
a2:	SWR $sp, -2, $fp
a3:	SWR $sp, -3, $r3
a4:	SWR $sp, -4, $ra
a5:	CPR $fp, $sp
a6:	SRI $sp, 4
a7:	SWR $sp, -1, $sp
a8:	SWR $sp, -2, $fp
a9:	SWR $sp, -3, $r3
a10:	SWR $sp, -4, $ra
a11:	CPR $fp, $sp
a12:	SRI $sp, 4
a13:	NOP 
a14:	LWR $ra, $fp, -4
a15:	LWR $r3, $fp, -1
a16:	LWR $fp, $fp, -2
a17:	CPR $sp, $r3
a18:	LWR $ra, $fp, -4
a19:	LWR $r3, $fp, -1
a20:	LWR $fp, $fp, -2
a21:	CPR $sp, $r3
a22:	EXIT 0
.data	4096
.stack	20480
.end
$ 


      SUPPORTING PROCEDURES AND CALLS

Main issues:
   - storing their code
     Why?
       because only execute procedures
       when they are called

   - knowing exactly where each starts
     Why?
        the CALL instruction needs an address


Another issue:
   - sending the right static link
        so that the procedure's body
	can find constants and variables
	using static scoping
	

       WHERE TO PUT PROCEDURE CODE?

Possible layouts in VM's code array:

  1. store the main program first, then proc code
      [code to set up program's AR]
      [code for the main program's block]
      [(optional: code to tear down the block]
      EXIT 0
      [code for each procedure...]

  2. store the proc code first, then main program
      JMPA size(proc code)+1
      [code for each procedure ...]
      [code to set up program's AR]
      [code for the main program's block]
      [(optional: code to tear down the block]
      EXIT 0      

  3. store proc code under programmer control
      (not in SPL)


      NESTED PROCEDURES ARE A PROBLEM

begin
  proc A
  begin
    proc B
    begin
      # B's body code...
      call A # ...
      # ...
    end;
    # A's body code
    call B # ...
    # ...
  end;
  call A
end.

If lay out the code as

   [ code for A ]
   [ code for B ]

How do we know the address of B
    to compile the call to B in A?
   We don't 

What about the other direction?
   We don't     


   RECURSIVE PROCEDURES, SIMILAR PROBLEM

begin
  proc R
  begin
    # R's body code ...
    call R
    # ...
  end;
  # ...
  call R
  # ...
end.

Before storing code for R,
  how do we know where it starts?


  MUTUAL RECURSION (NOT IN OUR LANGUAGE)
        
begin
  proc O
  begin # O's body code...
    call E
    # ...
  end;

  proc E;
  begin
    # E's body code ...
    call O
    # ...
  end;

  # ...
  call O;
  call E
  # ...
end.

One of these must before the other in
  the code area of the VM...


       SOLUTION STRATEGIES FOR CALLS

[Multiple passes]:
  1. Generate code for each procedure
     (+ store offsets in symbol table,
      + layout procedure code in memory
        with placholders for calls)
  2. Gather table of addresses
     (map from names to addresses,
      using offsets and beginning address)
  3. Patch up code addresses for calls
     (+ output code)

[Lazy evaluation, labels]:
  1. Generate code for each procedure
     with calls to "labels"
     (+ store or update
        labels in symbol table)
     (+ output code)

    GENERAL SOLUTION: MULTIPLE PASSES

Problem: where does each procedure start?

Passes over the IR:
  1. Compile all procedure code
     (now know how big each procedure is)
  2. Lay out procedure code in memory
     (now know where each starts)
  3. Change each call instruction


        GENERAL SOLUTION: LABELS

Use "labels" to allow


Term "label" is from assembly language

    ;  ...
    jmp L
    ; ...
    L: ; ...


        APPROACHES TO FIXING LABELS

Problem: convert labels to addresses

 (1) Use multiple passes
       a. Generate code with labels
       b. Lay out memory for procedures
          (determine starting addresses)
       c. Change labels to addresses

     advantages:


     disadvantages:


 (2) Use shared mutable data (lazy eval.)
       a. labels are unique placeholders,
          shared by all uses (calls)
       b. when address is determined,
          update the placeholder
          (and all uses are updated)

     advantages:


     disadvantages:


       LABEL DATA STRUCTURE

// file label.h
// ...
#include "machine_types.h"

typedef struct {
    bool is_set;
    unsigned int word_offset;
} label;

// Return a fresh label that is not set
extern label *label_create();

// Requires: lab != NULL
// Set the address in the label
extern void label_set(label *lab,
               unsigned int word_offset);

// Is the given label set?
extern bool label_is_set(label *lab);

// Requires: label_is_set(lab)
// Return the word offset in lab
extern
unsigned int label_read(label *lab);

            CREATING LABELS

Labels created for all procedures
when creating proc_decl_t ASTs.

// file ast.h
// ...
#include "label.h"

// ...

typedef struct proc_decl_s {
    file_location *file_loc;
    AST_type type_tag;
    struct proc_decl_s *next; // for lists
    const char *name;
    struct block_s *block;
    label *lab; // for code generation
} proc_decl_t;


// file ast.c

// Return an AST for a proc_decl
proc_decl_t ast_proc_decl(ident_t ident,
                          block_t block)
{
    proc_decl_t ret;
    ret.file_loc = // ...
    ret.type_tag = proc_decl_ast;
    ret.next = NULL;
    ret.name = ident.name;
    block_t *p = // ...
    ret.block = p;

    // this is the source of the labels
    ret.lab = label_create();
    assert(ret.lab != NULL);

    return ret;
}

     PROPAGATING POINTERS TO LABELS (1)

Labels added to attributes
of procedure names

// file id_attrs.h
//...
#include "label.h"

typedef struct {
    file_location file_loc;
    id_kind kind;  // kind of identifier
    unsigned int offset_count;
    // for a procedure, its label
    label *lab;
} id_attrs;


     PROPAGATING POINTERS TO LABELS (2)

Make call statement ASTs point to the
label of the procedure being called

// file scope_check.c

// check the statement to make sure that
// the procedure has been declared
// (if not, then produce an error).
// Modifies the given AST
// to have appropriate id_use pointers.
void scope_check_callStmt(
                        call_stmt_t *stmt)
{

  stmt->idu = scope_check_ident_declared(
          *(stmt->file_loc), stmt->name);
  assert(stmt->idu != NULL);
  id_attrs *attrs
            = id_use_get_attrs(stmt->idu);
  // check that it's a procedure, or error
  assert(attrs != NULL);
  assert(attrs->lab != NULL);
}


     PROPAGATING POINTERS TO LABELS (3)

Associate labels with
each call instruction in code structures

// file code.h
// ...
#include "label.h"

typedef struct code_s {
    struct code_s *next;
    bin_instr_t instr;
    // labels for call instructions
    label *lab; 
} code;

// ...

// Requires: lab != NULL
// Create and return a fresh instruction
// with the named mnemonic and parameters
extern code *code_call(address_type a,
                       label *lab);

// ...

        Where should the label passed to the code_call function
        come from?
           From the AST for the proc declaration of the called
           procedure (via the call statement). Has to be the same pointer!
           
        This puts labels in code sequences also

        So what has been achieved?

           Every procedure declared has a label,
           every call to a procedure points to that label
              (as does every call instruction code)

     Now that the labels are where they need to be,
     information about where each procedure starts needs to get to its
     label and from there to the call instruction


    SETTING LABELS IN PROCEDURES (1)

// file label.h

typedef struct {
    bool is_set;
    unsigned int word_offset;
} label;

      Where is the address of a procedure known?
         In code generation when the procedure has been compiled


    SETTING LABELS IN PROCEDURES (2)

// file gen_code.c

// ...

void gen_code_proc_decl(proc_decl_t pd)
{
    code_seq pdc = gen_code_block(*(pd.block));
    // add code to return from the procedure call
    code_seq_add_to_end(&pdc, code_rtn());
    unsigned int proc_offset
         = proc_holder_register(pdc);

    label_set(pd.lab, proc_offset);
}

------------------------------------------

    What makes code for a procedure?
       Its block

    I also added a data structure to hold the code_seq for procedures
       called a proc_holder, with a register function.
       The proc_holder_register function returns the procedure's offset
       (based on where it goes in the text section)

    At the end of code generation, what has this achieved?

       Now we know for each procedure it's address,
       which is in its label.
       And the label is pointed to by each code
       for each call instruction.

    Can some call instructions not have their labels set
       at the end of code generation?

       No, assuming that scope checking made sure that each call was a
       call to a declared procedure,
       then by the end of code generation
       each procedure declaration has been processed,
       and during gen_code_proc_decl we set each label,
       so each label should be set by then


  PUTTING ADDRESSES IN CALL INSTRUCTIONS

Write a function to fix
all call instructions in a code_seq

extern
void code_seq_fix_labels(code_seq cs);

     For each (code *)c in a code_seq do
      unsigned int a = label_read(c->lab);
      c->instr.jump.addr = a;

   It's best to use assert(label_is_set(c->lab));
       in such a loop, for debugging.

   How would you test a solution?
   Write SPL code that uses procedures and procedure calls
     start with the simplest examples