COP 3402 meeting -*- Outline -*- * ASTs ** What is an AST ------------------------------------------ ABSTRACT SYNTAX TREES (ASTs) def: an *abstract syntax tree* or AST is ------------------------------------------ ... a tree that represents the essential structure of the parse tree without: punctuation (no semicolons, commas, parentheses, etc.) extra levels (no terms, factors, etc.) The opposite is a concrete tree or parse tree, which includes all the punctuation and extra levels needed for parsing ------------------------------------------ EXAMPLE AST AST for 3*4+2: ------------------------------------------ / \ \ + / | \ | * 2 | | 3 4 Q: How does this compare to the parse tree for this we had earlier? That was: / \ / \ \ \ | / \ \ \ | | | | | 3 + | | | * 4 2 ** Designing ASTs ------------------------------------------ DESIGNING ASTS Want to: - Represent all structure in programs - avoid unnecessary levels How to design them? ------------------------------------------ ... use a grammar, the "abstract syntax" *** abstract syntax ------------------------------------------ ABSTRACT SYNTAX A language's *abstract syntax* is a grammar that generates the same language and Allows: - ambiguity but think of it as ------------------------------------------ ... has a simpler set of rules ... fully parenthesized (to eliminate ambiguity) (also left recursion but that is okay even if using a recursive descent parser, since this is not used for parsing so recursion is not a problem) *** example ------------------------------------------ EXAMPLE ABSTRACT SYNTAX FOR PL/0 B ::= CDs VDs PDs S CDs ::= { CD } CD ::= const CDefs CDefs ::= { CDef } CDef ::= x n VDs ::= { VD } VD ::= var xs xs ::= { x } PDs ::= { PD } PD ::= procedure x B S ::= AS | BS | IfS | WhS | RS | WS | SkS AS ::= assign x E BS ::= begin Ss Ss ::= { S } IfS ::= if C S1 S2 WhS ::= while C S RS ::= read x WS ::= write E SkS ::= skip C ::= OddC | ROC OddC ::= odd E ROC ::= E1 r E2 r ::= = | <> | < | <= | > | >= E ::= BOE | x | n BOE ::= E o E o ::= + | - | * | / where x in n in { X } means 0 or more X's ------------------------------------------ Capitalized names are nonterminals in the abstract syntax (this is traditional) Q: What happened to negation of numbers as expressions? We can represent that with just a number that may be negative (negate it at compile time) ** Representing ASTs *** In an OO language it's easier than in C! ------------------------------------------ ASTs IN AN OO LANGUAGE (C++/JAVA) For each rule of the form: X ::= kw1 A B | kw2 C D Use: an abstract class, named "X" subclasses of X: - kw1 with fields of types A and B - kw2 with fields of types C and D //Example (in Java): public abstract class AST { string filename; int line; } public abstract class Stmt extends AST {} public class Assign extends Stmt { Ident x; Expr e; public Assign(Ident x, Expr e) { this.x = x; this.e = e; } } public class If extends Stmt { Cond c; Stmt s1; Stmt s2; public If(Cond c, Stmt s1, Stmt s2) { this.c = c; this.s1 = s1; this.s2 = s2; } } // ... ------------------------------------------ *** In C We don't have objects or subtyping so use: - structs that share a common structure - a tagged union instead of subtypes **** type tags (AST_type) ------------------------------------------ EXAMPLE IN C // file ast.h #include "file_location.h" // types of ASTs (type tags) typedef enum { block_ast, const_decls_ast, var_decls_ast, /* ... */ number_ast, ident_ast, empty_ast } AST_type; // common AST structure typedef struct { file_location *file_loc; AST_type type_tag; void *next; } generic_t; // ... continued... ------------------------------------------ Actually the next field is only in structs that can be lists **** struct types for each kind of nonterminal ***** structs for declarations and statements The explanation starts with blocks, but in ast.h, the order is backwards (due to C requiring declaration before use) ------------------------------------------ STRUCTS FOR EACH KIND OF NONTERMINAL #include "file_location.h" // B ::= CDs VDs PDs S typedef struct block_s { file_location *file_loc; AST_type type_tag; const_decls_t const_decls; var_decls_t var_decls; proc_decls_t proc_decls; stmt_t stmt; } block_t; // CDs ::= { CD } typedef struct { file_location *file_loc; AST_type type_tag; const_decl_t *const_decls; } const_decls_t; // VD ::= var x typedef struct { const char *name; } var_decl_t; // CD ::= const CDefs typedef struct const_decl_s { file_location *file_loc; AST_type type_tag; struct const_decl_s *next; const_defs_t const_defs; } const_decl_t; // CDefs ::= { CDef } typedef struct { file_location *file_loc; AST_type type_tag; const_def_t *const_defs; } const_defs_t; // CDef ::= ident number typedef struct const_def_s { file_location *file_loc; AST_type type_tag; struct const_def_s *next; ident_t ident; number_t number; } const_def_t; /* ... */ // forward declaration for expressions struct expr_s; /* ... */ // AS ::= x E typedef struct { file_location *file_loc; AST_type type_tag; const char *name; struct expr_s *expr; } assign_stmt_t; // BS ::= Ss typedef struct { file_location *file_loc; AST_type type_tag; stmts_t stmts; } begin_stmt_t; // IfS ::= if C S1 S2 typedef struct { file_location *file_loc; AST_type type_tag; condition_t condition; struct stmt_s *then_stmt; struct stmt_s *else_stmt; } if_stmt_t; /* ... */ // empty ::= typedef struct { file_location *file_loc; AST_type type_tag; } empty_t; // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // (possibly signed) numbers typedef struct { file_location *file_loc; AST_type type_tag; const char *text; word_type value; } number_t; // tokens as ASTs typedef struct { file_location *file_loc; AST_type type_tag; const char *text; int code; } token_t; ------------------------------------------ ***** structs for conditions and expressions ------------------------------------------ STRUCTS FOR CONDITIONS AND EXPRESSIONS // kinds of conditions typedef enum { ck_odd, ck_rel } condition_kind_e; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr; } odd_condition_t; typedef struct { file_location *file_loc; AST_type type_tag; expr_t expr1; token_t rel_op; expr_t expr2; } rel_op_condition_t; // kinds of expressions typedef enum { expr_bin, expr_ident, expr_number } expr_kind_e; // forward declaration for expressions struct expr_s; // BOE ::= E1 o E2 // o ::= + | - | * | / typedef struct { file_location *file_loc; AST_type type_tag; struct expr_s *expr1; token_t arith_op; struct expr_s *expr2; } binary_op_expr_t; // E ::= BOE | x | n typedef struct expr_s { file_location *file_loc; AST_type type_tag; expr_kind_e expr_kind; union { binary_op_expr_t binary; ident_t ident; number_t number; } data; } expr_t; ------------------------------------------ **** the AST type as a tagged union ------------------------------------------ THE AST TYPE // file ast.h // The AST definition used by bison typedef union AST_u { generic_t generic; block_t block; const_decls_t const_decls; const_decl_t const_decl; const_defs_t const_defs; const_def_t const_def; var_decls_t var_decls; var_decl_t var_decl; idents_t idents; proc_decls_t proc_decls; proc_decl_t proc_decl; stmt_t stmt; assign_stmt_t assign_stmt; call_stmt_t call_stmt; begin_stmt_t begin_stmt; if_stmt_t if_stmt; while_stmt_t while_stmt; read_stmt_t read_stmt; write_stmt_t write_stmt; skip_stmt_t skip_stmt; stmts_t stmts; condition_t condition; rel_op_condition_t rel_op_condition; odd_condition_t odd_condition; expr_t expr; binary_op_expr_t binary_op_expr; token_t token; number_t number; ident_t ident; empty_t empty; } AST; ------------------------------------------ ------------------------------------------ CORRESPONDENCE BETWEEN TAGS AND UNION Whenever the type_tag field is X_ast then the data field's type is X_t and the union's subfield is named X ------------------------------------------ **** AST list type ------------------------------------------ LINKED LISTS OF ASTS // file ast.h // identifiers typedef struct ident_s { file_location *file_loc; AST_type type_tag; struct ident_s *next; // for lists const char *name; } ident_t; // ... // idents ::= { ident } typedef struct { file_location *file_loc; AST_type type_tag; ident_t *idents; } idents_t; ------------------------------------------ This is used for all kinds of lists, even non-empty ones, as in stmts_t ** Building ASTs in a parser *** example: following the grammar, like parseIfStmt ------------------------------------------ BUILDING ASTS USING RULES IN BISON /* ::= if then else */ ------------------------------------------ Q: What would the rule in pl0.y look like for an if-statement? ifStmt : "if" condition "then" stmt "else" stmt { $$ = ast_if_stmt($2, $4, $6); } ; Q: What are the types of the arguments to ast_if_stmt? condition_t, stmt_t, and stmt_t Q: What is the return type of ast_if_stmt? if_stmt_t Q: Where should the file location of the AST come from? the starting token, or the first argument passed to the ast function, which in this case is the condition's AST ($2). (might look at the code for ast_if_stmt now...) *** example: alternatives, like parseStmt How to build ASTs for statements ------------------------------------------ EXAMPLE WITH ALTERNATIVES: STATEMENTS stmt : assignStmt | callStmt | beginStmt | ifStmt | whileStmt | readStmt | writeStmt | skipStmt ; ------------------------------------------ Q: What are the types of the ASTs on the right sides of these rules? assign_stmt_t, call_stmt_t, begin_stmt_t, etc. Q: What needs to be done to build the AST for the stmt itself? Need to call an ast function, like ast_stmt_assign, ast_stmt_call, etc. ... stmt : assignStmt { $$ = ast_stmt_assign($1); } | callStmt { $$ = ast_stmt_call($1); } | beginStmt { $$ = ast_stmt_begin($1); } // ... ; *** example: loops, like parseBeginStmt ------------------------------------------ EXAMPLE WITH A LOOP: STMTS IN A BEGIN // ::= {; } stmts : stmt | stmts ";" stmt { $$ = ast_stmts($1,$3); } ; ------------------------------------------ Q: How would the ASTs be constructed? stmts : stmt { $$ = ast_stmts_singleton($1); } | stmts ";" stmt { $$ = ast_stmts($1,$3); } ;