I. Overview of Static Analysis A. Goals for Static Analysis for a Compiler ------------------------------------------ GOALS FOR STATIC ANALYSIS - Protect back end from extra cases - Catch problems, to improve programmer productivity - Gather information to improve programs ------------------------------------------ What does static mean? Where would type checking fit in here? 1. Kinds of problems to catch: ------------------------------------------ PROBLEMS TO CATCH ------------------------------------------ How could these problems help the programmer? II. architecture A. Parser, Symbol Table, and Code Generation 1. tasks needed ------------------------------------------ WHAT A COMPILER NEEDS TO DO - static analysis e.g., type checking needs: - each name's type - code generation needs: - each name's location in AR (offset) - each instruction's location for jumps ------------------------------------------ Do we need to type check statically? What information does the VM need to load and store from variables? What information does the VM need for jump instructions? 2. data structures needed a. symbol table ------------------------------------------ SYMBOL TABLE Gives information about each name: def: a *symbol table* maps ------------------------------------------ What kind of data structure would work to store this information? b. storage layout ------------------------------------------ STORAGE LAYOUT Compiler needs to track: memory layout within a scope: - what offset can store the next variable or constant? instruction layout: - what address to jump to? - for if-statements, while-loops ------------------------------------------ What kind of data structure could be used to store this layout? How would an if-then-else be processed in a stack-based VM? When generating code for an if-then-else, how can the compiler know the target addresses to jump to? c. summary ------------------------------------------ SUMMARY OF DATA NEEDS Symbol Table: maps names -> attributes (kind, size, etc.) Code Manager: - data allocated in a frame - instruction counts for pieces of code ------------------------------------------ For the symbol table, what operations are needed? For the code manager, what operations are needed? B. Basic architectural issues for static analysis and code generation ------------------------------------------ HOW SHOULD PARSER COMMUNICATE? Strategies: [Action-based] During parsing: - for nonterminal , parseN() - adds to symbol table - checks for errors - allocates and tracks storage - emits code [Tree-based] During parsing: - for nonterminal , parseN() - creates and returns an AST Separate phases walk tree to: - build symbol table - check for errors - allocate and track storage - (improve the tree's computation e.g., eliminate unneeded computation steps) - emit code ------------------------------------------ If we use a tree-based approach, then when and how does the symbol table get built? ------------------------------------------ ADVANTAGES AND DISADVANTAGES Action-based architecture: Tree-based: ------------------------------------------ III. ASTs A. What is an AST ------------------------------------------ ABSTRACT SYNTAX TREES (ASTs) def: an *abstract syntax tree* or AST is ------------------------------------------ ------------------------------------------ EXAMPLE AST AST for 3*4+2: ------------------------------------------ How does this compare to the parse tree for this we had earlier? B. Designing ASTs ------------------------------------------ DESIGNING ASTS Want to: - Represent all structure in programs - avoid unnecessary levels How to design them? ------------------------------------------ 1. abstract syntax ------------------------------------------ ABSTRACT SYNTAX A language's *abstract syntax* is a grammar that generates the same language and Allows: - ambiguity but think of it as - left recursion ------------------------------------------ 2. example ------------------------------------------ EXAMPLE AST FOR PL/0 P ::= { CD } { VD } S CD ::= const x n VD ::= var x S ::= assign x E | begin { S } | if C S1 S2 | while C S | read x | write E | skip C ::= E1 r E2 r ::= = | <> | < | <= | > | >= E ::= E o E | x | n o ::= + | - | * | / where x in n in { X } means 0 or more X's ------------------------------------------ What happened to negation of numbers as expressions? C. Representing ASTs 1. In an OO language ------------------------------------------ ASTs IN AN OO LANGUAGE (C++/JAVA) For each rule of the form: X ::= kw1 A B | kw2 C D Use: an abstract class, named "X" subclasses of X: - kw1 with fields of types A and B - kw2 with fields of types C and D //Example: public abstract class AST { string filename; int line, column; } public abstract class Stmt extends AST {} public class Assign extends Stmt { Ident x; Expr e; public Assign(Ident x, Expr e) { this.x = x; this.e = e; } } public class If extends Stmt { Cond c; Stmt s1; Stmt s2; public If(Cond c, Stmt s1, Stmt s2) { this.c = c; this.s1 = s1; this.s2 = s2; } } // ... ------------------------------------------ 2. In C a. type tags (AST_type) ------------------------------------------ EXAMPLE IN C // file ast.h #include "file_location.h" // types of ASTs (type tags) typedef enum { program_ast, const_decl_ast, var_decl_ast, assign_ast, begin_ast, if_ast, while_ast, read_ast, write_ast, skip_ast, odd_cond_ast, bin_cond_ast, binary_exp_ast, ident_ast, number_ast } AST_type; typedef struct AST_s AST; // forward decl // ... continued... ------------------------------------------ b. struct types for each kind of nonterminal i. structs for declarations and statements ------------------------------------------ STRUCTS FOR EACH KIND OF NONTERMINAL // P ::= { CD } { VD } S typedef struct { AST_list cds; AST_list vds; AST *stmt; } program_t; // CD ::= const x n typedef struct { const char *name; short int num_val; } const_decl_t; // VD ::= var x typedef struct { const char *name; } var_decl_t; // S ::= assign x E typedef struct { const char *name; AST *exp; } assign_t; // S ::= begin { S } typedef struct { AST_list stmts; } begin_t; // S ::= if C S1 S2 typedef struct { AST *cond; AST *thenstmt; AST *elsestmt; } if_t; // ... more below ... ------------------------------------------ ii. structs for conditions and expressions ------------------------------------------ STRUCTS FOR CONDITIONS AND EXPRESSIONS // r ::= = | <> | < | <= | > | >= typedef enum {eqop, neqop, ltop, leqop, gtop, geqop} rel_op; // C ::= odd E typedef struct { AST *exp; } odd_cond_t; // C ::= E1 r E2 typedef struct { AST *leftexp; rel_op relop; AST *rightexp; } bin_cond_t; // o ::= + | - | * | / typedef enum {addop, subop, multop, divop} bin_arith_op; // E ::= E o E typedef struct { AST *leftexp; bin_arith_op arith_op; AST *rightexp; } bin_expr_t; // E ::= x typedef struct { const char *name; } ident_t; // E ::= n typedef struct { short int value; } number_t; // ... now for the AST itself... ------------------------------------------ c. the AST type as a tagged union ------------------------------------------ THE AST TYPE // file ast.h // The actual AST definition: typedef struct AST_s { file_location file_loc; // something for lists AST_type type_tag; union AST_u { program_t program; const_decl_t const_decl; var_decl_t var_decl; assign_t assign_stmt; begin_t begin_stmt; if_t if_stmt; while_t while_stmt; read_t read_stmt; write_t write_stmt; skip_t skip_stmt; odd_cond_t odd_cond; bin_cond_t bin_cond; op_expr_t op_expr; bin_expr_t bin_expr; ident_t ident; number_t number; } data; } AST; ------------------------------------------ ------------------------------------------ CORRESPONDENCE BETWEEN TAGS AND UNION Whenever the type_tag field is X_ast then the data field's type is X_t and the union's subfield is named X, X_decl, X_stmt, X_cond, or X_expr ------------------------------------------ d. AST list type ------------------------------------------ LINKED LISTS OF ASTS // file ast.h typedef AST *AST_list; Used in: - Program ASTs for lists of - Begin statement ASTs for list of ------------------------------------------ D. Building ASTs in a parser 1. example: following the grammar, like parseIfStmt ------------------------------------------ BUILDING ASTS BY FOLLOWING THE GRAMMAR ::= if then else ------------------------------------------ What does the code for a recursive-descent parser look like for this grammar of if-statements? Where should the file location of the AST come from? 2. example: alternatives, like parseStmt ------------------------------------------ EXAMPLE WITH ALTERNATIVES: PARSESTMT AST *parseStmt() { ------------------------------------------ What kind of error can be reported if see the wrong type of token? 3. example: loops, like parseBeginStmt ------------------------------------------ EXAMPLE WITH A LOOP: PARSEBEGINSTMT // S ::= begin S { ; S } end ------------------------------------------ To start, what is the recognizer like? Looking at ast.h, what does ast_begin_stmt() need as arguments? What should the token be? how should we add each statement's AST to the end of the list?