GOALS FOR STATIC ANALYSIS - Protect code generator from extra cases i.e., cases not defined for the language - Catch problems, to improve programmer productivity: - use of variables before given a value - computations that can't be used - Gather information to improve programs used for optimization e.g., what value each variable depends on PROBLEMS TO CATCH - inconsistencies, in general e.g., type errors undeclared identifiers obviously wrong casts (always fail) - obviously useless code WHAT A COMPILER NEEDS TO DO - static analysis e.g., type checking needs: - each name's type e.g., consider: a[i] + f() - 9 illegal: p1 + p2, if p1 and p2 are procedures c := 9, if c is a constant - code generation needs: - each var or const name's lexical addr - each procedure's starting address SYMBOL TABLE Gives information about each name: - kind of name (const, var, or procedure) - type - offset (for consts and vars) def: a *symbol table* maps names to attributes (attributes are properties: like kind, type, offset, ...) data structures and algorithms: - linked list with linear search - array with linear search - sorted array with binary search - hash table with hashing - binary search tree with a search STORAGE LAYOUT Compiler needs to track: memory layout within a potential scope: - what offset can store the next variable or constant? instruction layout: - what address to jump to? - for if-statements, while-loops SUMMARY OF DATA NEEDS Symbol Table: maps names -> attributes (kind, size, etc.) Code Manager: - data allocated in a frame - instruction counts for pieces of code HOW SHOULD PARSER COMMUNICATE? Strategies: [Action-based] During parsing: - for nonterminal , action: - adds to symbol table - checks for errors - allocates and tracks storage - emits code [Tree-based] During parsing: - for nonterminal , action: - creates and returns an AST Walk tree to: - build symbol table - check for errors - allocate and track storage - (improve the tree's computation e.g., eliminate unneeded computation steps) - emit code ADVANTAGES AND DISADVANTAGES Action-based architecture: + easier to have parsing and static analysis influence the lexical analysis (useful for C and C++, where need to know what names are types) + easier to start coding - but harder to split up work (lots of tasks interwoven with parsing) - harder to maintain and debug Tree-based: - harder to have the parser influence lexical analysis (so harder to parse some languages) + phases of compiler can be easily separated + easier to maintain and debug + better theory that goes with this option ABSTRACT SYNTAX TREES (ASTs) def: an *abstract syntax tree* or AST is a tree that represents the essential structure of a parse without: - punctuation (",", ";", etc.) - extra levels of nonterminals EXAMPLE AST AST for 3*4+2: / \ \ + / | \ \ * | | | 2 | | 3 4 Parse tree for this: / \ / \ | \ | / \ | | | | | | | 3 + | | | * 4 2 DESIGNING ASTS Want to: - Represent all structure in programs - avoid unnecessary levels How to design them? use a grammar, called "the abstract syntax" ABSTRACT SYNTAX A language's *abstract syntax* is a grammar that generates the same language and has a simpler set of rules Allows: - ambiguity but think of it asz EXAMPLE ABSTRACT SYNTAX FOR PL/0 B ::= CDs VDs PDs S CDs ::= { CD } CD ::= const CDefs CDefs ::= { CDef } CDef ::= x n VDs ::= { VD } VD ::= var xs xs ::= { x } PDs ::= { PD } PD ::= procedure x B S ::= AS | BS | IfS | WhS | RS | WS | SkS AS ::= assign x E BS ::= begin Ss Ss ::= { S } IfS ::= if C S1 S2 WhS ::= while C S RS ::= read x WS ::= write E SkS ::= skip C ::= OddC | ROC OddC ::= odd E ROC ::= E1 r E2 r ::= = | <> | < | <= | > | >= E ::= BOE | x | n BOE ::= E o E o ::= + | - | * | / where x in n in { X } means 0 or more X's ASTs IN AN OO LANGUAGE (C++/JAVA) For each rule of the form: X ::= kw1 A B | kw2 C D Use: an abstract class, named "X" subclasses of X: - kw1 with fields of types A and B - kw2 with fields of types C and D //Example (in Java): public abstract class AST { string filename; int line; } public abstract class Stmt extends AST {} public class Assign extends Stmt { Ident x; Expr e; public Assign(Ident x, Expr e) { this.x = x; this.e = e; } } public class If extends Stmt { Cond c; Stmt s1; Stmt s2; public If(Cond c, Stmt s1, Stmt s2) { this.c = c; this.s1 = s1; this.s2 = s2; } } // ... EXAMPLE IN C // file ast.h #include "file_location.h" // types of ASTs (type tags) typedef enum { block_ast, const_decls_ast, var_decls_ast, /* ... */ number_ast, ident_ast, empty_ast } AST_type; // common AST structure typedef struct { file_location *file_loc; AST_type type_tag; void *next; } generic_t; // ... continued... STRUCTS FOR EACH KIND OF NONTERMINAL #include "file_location.h" // B ::= CDs VDs PDs S typedef struct block_s { file_location *file_loc; AST_type type_tag; const_decls_t const_decls; var_decls_t var_decls; proc_decls_t proc_decls; stmt_t stmt; } block_t; // CDs ::= { CD } typedef struct { file_location *file_loc; AST_type type_tag; const_decl_t *const_decls; } const_decls_t; // VD ::= var x typedef struct { const char *name; } var_decl_t; // CD ::= const CDefs typedef struct const_decl_s { file_location *file_loc; AST_type type_tag; struct const_decl_s *next; const_defs_t const_defs; } const_decl_t; // CDefs ::= { CDef } typedef struct { file_location *file_loc; AST_type type_tag; const_def_t *const_defs; } const_defs_t; // CDef ::= ident number typedef struct const_def_s { file_location *file_loc; AST_type type_tag; struct const_def_s *next; ident_t ident; number_t number; } const_def_t; /* ... */ // forward declaration for expressions struct expr_s; /* ... */ // AS ::= x E typedef struct { file_location *file_loc; AST_type type_tag; const char *name; struct expr_s *expr; } assign_stmt_t; // BS ::= Ss typedef struct { file_location *file_loc; AST_type type_tag; stmts_t stmts; } begin_stmt_t; // IfS ::= if C S1 S2 typedef struct { file_location *file_loc; AST_type type_tag; condition_t condition; struct stmt_s *then_stmt; struct stmt_s *else_stmt; } if_stmt_t; /* ... */ // empty ::= typedef struct { file_location *file_loc; AST_type type_tag; } empty_t; // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // (possibly signed) numbers typedef struct { file_location *file_loc; AST_type type_tag; const char *text; word_type value; } number_t; // tokens as ASTs typedef struct { file_location *file_loc; AST_type type_tag; const char *text; int code; } token_t; STRUCTS FOR CONDITIONS AND EXPRESSIONS // kinds of conditions typedef enum { ck_odd, ck_rel } condition_kind_e; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr; } odd_condition_t; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr1; token_t rel_op; expr_t expr2; } rel_op_condition_t; // kinds of expressions typedef enum { expr_bin, expr_ident, expr_number } expr_kind_e; // forward declaration for expressions struct expr_s; // BOE ::= E1 o E2 // o ::= + | - | * | / typedef struct { file_location *file_loc; AST_type type_tag; struct expr_s *expr1; token_t arith_op; struct expr_s *expr2; } binary_op_expr_t; // E ::= BOE | x | n typedef struct expr_s { file_location *file_loc; AST_type type_tag; expr_kind_e expr_kind; union { binary_op_expr_t binary; ident_t ident; number_t number; } data; } expr_t; THE AST TYPE // file ast.h // The AST definition used by bison typedef union AST_u { generic_t generic; block_t block; const_decls_t const_decls; const_decl_t const_decl; const_defs_t const_defs; const_def_t const_def; var_decls_t var_decls; var_decl_t var_decl; idents_t idents; proc_decls_t proc_decls; proc_decl_t proc_decl; stmt_t stmt; assign_stmt_t assign_stmt; call_stmt_t call_stmt; begin_stmt_t begin_stmt; if_stmt_t if_stmt; while_stmt_t while_stmt; read_stmt_t read_stmt; write_stmt_t write_stmt; skip_stmt_t skip_stmt; stmts_t stmts; condition_t condition; rel_op_condition_t rel_op_condition; odd_condition_t odd_condition; expr_t expr; binary_op_expr_t binary_op_expr; token_t token; number_t number; ident_t ident; empty_t empty; } AST; CORRESPONDENCE BETWEEN TAGS AND UNION Whenever the type_tag field is X_ast then the data field's type is X_t and the union's subfield is named X LINKED LISTS OF ASTS // file ast.h // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // ... // idents ::= { ident } typedef struct { file_location *file_loc; AST_type type_tag; ident_t *idents; } idents_t; BUILDING ASTS USING RULES IN BISON /* ::= if then else */ ifStmt : "if" condition "then" stmt "else" stmt { $$ = ast_if_stmt($2, $4, $6); } ; EXAMPLE WITH ALTERNATIVES: STATEMENTS stmt : assignStmt { $$ = ast_stmt_assign($1); } | callStmt { $$ = ast_stmt_call($1); } | beginStmt { $$ = ast_stmt_begin($1); } | ifStmt { $$ = ast_stmt_if($1); } | whileStmt { $$ = ast_stmt_while($1); } | readStmt { $$ = ast_stmt_read($1); } | writeStmt{ $$ = ast_stmt_write($1); } | skipStmt { $$ = ast_stmt_skip($1); } ; EXAMPLE WITH A LOOP: STMTS IN A BEGIN // ::= {; } stmts : stmt { $$ = ast_stmts_singleton($1); } | stmts ";" stmt { $$ = ast_stmts($1,$3); } ;