COP 3402 meeting -*- Outline -*- * Review for the midterm exam Please ask questions about any concepts you are unsure of as we go... ** VMs ------------------------------------------ FOR YOU TO DO What happens to a VM's PC register if no halt (HLT) instruction is executed? a. The VM notices that the program is done and halts. b. The VM interprets each address as an instruction, and keeps adding 1 to the PC. c. The VM stops when the PC reaches program's last instruction. d. The VM is not supposed to stop, so it issues an error message when the PC reaches the program's last instruction. ------------------------------------------ b, it goes into an infinite loop! ------------------------------------------ FOR YOU TO DO What is a lexical address? a. A page number in a dictionary b. A pair of a numbers, (scopes, offset), giving the address of a name, where scopes = number of static links to follow offset = word address in that AR d. An absolute address in the VM's memory for a variable (or constant). c. A pair of strings, (name, scope) where name = the identifier scope = the AR holding that name ------------------------------------------ b. a pair of (scopes,offset)... ------------------------------------------ FOR YOU TO DO Consider the following PL/0 program const bl = 32, nl = 10; var q, r, s, t; procedure F; var x, y, z; begin read y; # asking about y here q := y end; procedure G; read q; begin # asking about q here read q; write q end. For a word-addressed VM: -What is the lexical address of the variable "q" at the point of the second comment? a. (0,0) b. (1,1) c. (0,2) d. (2,3) -What is the lexical address of "y" at the point indicated by the first comment? e. (0,1) f. (1,1) g. (0,7) h. (0,0) ------------------------------------------ c (0,2) e (0,1) ** Grammars and Parsing ------------------------------------------ FOR YOU TO DO How many statement ASTs represent all of the following PL/0 code fragment? begin x := 1; y := x+2; z := 3*4+y-x; if odd z then x := 0 else y := 0 end 1. one (1) 2. two (2) 3. three (3) 4. four (4) ------------------------------------------ 1. there is just one statement AST, which contains the others. (There is a list of 4 statement ASTs inside the begin statement's AST) ------------------------------------------ FOR YOU TO DO Is the following a legal PL/0 statement (true or false)? begin x := 1; y := x+2; z := 3*4+y-x; end Which tokens are included in the above PL/0 code? (choose all that apply): a. identsym "y" b. numbersym 4 c. beginsym "begin" d. identsym "end" e. becomessym ":=" f. semisym ";" g. lparensym "(" h. rparensym ")" i. leqsym "<=" ------------------------------------------ False, there is no semicolon allowed just before the "end" token For each, ask where? a, b, c, e, f *** lexical analysis and regular grammars ------------------------------------------ FOR YOU TO DO What are the tokens for the following grammar? ::= { } ::= ( define ) ::= | | ( quote ) | ( { } ) | ( if ) where ::= { } ::= { } ::= a | b | c | ... | y | z | A | B | C | ... | Y | Z | + | - | * | / | = | ? ------------------------------------------ reserved words: if, define, quote other tokens: , (, ) ------------------------------------------ FOR YOU TO DO Consider C-style comments: /* ... */ in a lexical analyzer What would be a regular expression for such comments? What would a regular grammar look like? What would a DFA look like? ------------------------------------------ ... /[*].*[*]/ in Unix-style notation with the [*] notation meaning any character that is a * (so quoting the * character) the . meaning "any character", and the / being the '/' character itself ... ::= ::= / * ::= { } | ::= "any character other than a *" ::= * { * } ::= "any character other than a /" Start with the state for the '/' character, if the next char is a '*' character go to inside state. Inside state: consume characters until see a '*', if see a '*' char, go to stars state. Stars state: if see a non-slash, go to inside state, if see a '*', go back to stars state, if see a '/' char, finish recognizing the comment. Does that allow nested C-style comments? No! (For that, need a context-free grammar and a stack!) ** parsing ------------------------------------------ FOR YOU TO DO Briefly describe how an AST differs from a parse tree. ------------------------------------------ The AST is a parse tree for the abstract syntax, while a parse tree is for the concrete syntax. So an AST does not track: - punctuation (like ;), or - extra levels of the grammar needed for precedence (like , , etc.) ------------------------------------------ FOR YOU TO DO Given this grammar: ::= { } ::= ( define ) ::= | | ( quote ) | ( { } ) | ( if ) where ::= { } ::= { } ::= a | b | c | ... | y | z | A | B | C | ... | Y | Z | + | - | * | / | = | ? Write a leftmost derivation for the string (if (eq? x (quote y)) 3 (+ y 2)) or explain why that is not possible. As an example: )( is not in the language of this grammar. ------------------------------------------ -> -- choosing 0 s -> (if ) -> (if ( ) ) -- choosing 2 s -> (if ( ) ) -> (if (eq? ) ) -> (if (eq? ) ) -> (if (eq? x ) ) -> (if (eq? x (quote )) ) -> (if (eq? x (quote y)) ) -> (if (eq? x (quote y)) ) -> (if (eq? x (quote y)) ) -> (if (eq? x (quote y)) 3 ) -> (if (eq? x (quote y)) 3 ( )) -- choosing 2 -> (if (eq? x (quote y)) 3 (+ )) -> (if (eq? x (quote y)) 3 (+ )) -> (if (eq? x (quote y)) 3 (+ y )) -> (if (eq? x (quote y)) 3 (+ y )) -> (if (eq? x (quote y)) 3 (+ y 2)) A parse tree for this would look like: | /----------+------------------+-----\----\ / / / | \ \ (if ) / | \ | | / | \ /---+----+----------+--\ ( ) | / | | | \ | | \ 3 ( ) (quote ) | \ | | | | eq? x y | | | + y 2 ------------------------------------------ FOR YOU TO DO Write a suitable abstract syntax for the following concrete syntax ::= function ( ) ::= ... ::= | {, } ::= : ::= | ... ------------------------------------------ "comments about the grammar are in double quotes" F ::= function T x { P } E "where { P } means zero or more P" T ::= ... "type expressions" P ::= x T E ::= x | ... "expressions" where x in ------------------------------------------ FOR YOU TO DO What would be a suitable C typedef for AST for a language that includes ? ------------------------------------------ (see the file ast.h in this directory) you would need type tags typedef enum { function_ast, type_ast, param_ast, /* ... */ ident_exp_ast, /* ... AST tags for expressions */ } AST_type; // forward declaration, so can use the type AST* below typedef struct AST_s AST; // lists of ASTs typedef AST *AST_list; // structures for each kind of AST typedef struct { type_exp_t ret_type; const char *name; AST_list params; AST *stmt; } function_t; // ... typedef for type_exp_t typedef struct { const char *name; } param_t; typedef struct { const char *name; } ident_exp_t; // ... other expressions, etc. typedef struct AST_s { file_location file_loc; AST_list next; // for lists AST_type type_tag; union AST_u { // ... function_t function; // ... param_t param; ident_exp_t ident_exp; // ... } data; } AST; ** static analysis ------------------------------------------ FOR YOU TO DO What kind of errors are checked for during declaration checking? ------------------------------------------ 1. duplicate declarations within a scope, and 2. undeclared identifier uses * more review ** static analysis ------------------------------------------ FOR YOU TO DO Besides declaration checking, what other kinds of checking of PL/0 programs should be done statically? ------------------------------------------ checking that constants are never assigned to or read into ** systems software ------------------------------------------ FOR YOU TO DO What does a compiler do overall? a. It compiles things b. It translates things from one language to another c. It lists many error messages d. First it does lexical analysis, then parsing, then static analysis, then code generation. ------------------------------------------ b: it translates a programming language into some language closer to the machine (like machine language) *** VMs and processors ------------------------------------------ FOR YOU TO DO How does a jump (JMP) instruction affect the PC in a VM a. It replaces the address in the PC with an address from the instruction b. It causes the PC to rise above the floor c. It replaces the address in the PC with an address from the instruction which is then incremented by 1. d. The VM's PC is unaffected by such an instruction. ------------------------------------------ a. is correct *** Support for subroutines ------------------------------------------ FOR YOU TO DO Which of the following best describes "static scoping"? a. A rule that keeps telescopes hidden in C program files b. A rule that makes each variable name refer to the most recent declaration of that name during a program's execution c. A rule that says what declarations a static variable in a C program refers to. d. A rule that makes each name refer to the closest textually surrounding declaration of that name. ------------------------------------------ Answer (d) is correct ------------------------------------------ FOR YOU TO DO Which of the following best describes "dynamic scoping"? a. A rule that keeps telescopes constantly flying on airplanes. b. A rule that makes each variable name refer to the most recent declaration of that name during a program's execution c. A rule that says what declarations a static variable in a C program refers to. d. A rule that makes each name refer to the closest textually surrounding declaration of that name. ------------------------------------------ Answer (b) is dynamic scoping ------------------------------------------ FOR YOU TO DO Why is static scoping useful for programming? a. Because static scoping is used in the Unix shell. b. Because static scoping is easier to implement than dynamic scoping. c. Because static scoping makes names used in code refer to declarations that are evident in the program text. d. Because static scoping requires static links to be used when accessing local variables declared in surrounding scopes. ------------------------------------------ Answer (c) is correct (The Unix shell uses dynamic scoping, static scoping is arguably harder to implement, and the use of static links is not a convenience in programming.) ------------------------------------------ FOR YOU TO DO What are static links used for in a VM? a. To make a static fence b. To follow static scoping c. To create a static URL d. To ensure that each variable refers to the most recent declaration for that variable's name ------------------------------------------ Answer: b is correct, d describes dynamic scoping (which is wrong!) ------------------------------------------ FOR YOU TO DO How are static links used with lexical addresses in VM? a. The first component of a lexical address indicates how many static links to traverse. b. The second component of a lexical address indicates how many static links to traverse. c. The first component of a lexical address indicates how many pairs of static and dynamic links to traverse. d. The second component of a lexical address indicates how many pairs of static and dynamic links to traverse. ------------------------------------------ Answer: (a) is correct, dynamic links are not involved ** parsing ------------------------------------------ FOR YOU TO DO What parts of the grammar of PL/0 could NOT be parsed using regular expressions? a. Matching parentheses and begin-end b. Recognizing numeric literals like 3402 c. Recognizing reserved words d. Recognizing multiple-character tokens like ":=" and "<=" Could we write a parser for PL/0 without using a stack or recursion? a. no b. yes ------------------------------------------ a. matching is definitely context-free and not is not possible with regular expressions (this is a well-known result in formal language theory) a. no, note that a parser could be written without using recursion, but it would need a stack for doing context-free matching (e.g., for begin-end and parentheses) ------------------------------------------ FOR YOU TO DO In EBNF notation, what does ::= mean? a. These two things are equal. b. The left side is proportional to the right side. c. The left side can become the right side. d. The left side can jump to the right side. In BNF notation, what does | mean? a. It means "or"; it separates alternatives b. It means "and"; it separates consecutive terms c. It means nothing, it is just used to make the formatting look better d. It means that parsing stops at that point ------------------------------------------ ... c. is correct ... a. is correct ------------------------------------------ FOR YOU TO DO Consider the following grammar in EBNF notation: ::= { } ::= :- { } . ::= | ( { , } ) ::= | ::= ::= ::= , | ; where ::= { } ::= { } ::= A | B | ... | Z ::= a | b | ... | z ::= | Is the following a legal program? Practice :- true. Is the following legal? ancestor(A,C) :- parent(B,C), ancestor(A,B). ancestor(C,C). ------------------------------------------ The first is, but the second isn't as the second line of the second one isn't in the grammar. Q: What would it take to fix it, if necessary? The second line of the second one could be changed to a , e.g., ancestor(C,C) :- true. ------------------------------------------ FOR YOU TO DO What is an AST? a. The Association of Surgical Technologists b. Aspartate Transferase c. An Allocated Syntax Tree d. An Abstract Syntax Tree ------------------------------------------ d. is correct ** symbol tables ------------------------------------------ FOR YOU TO DO What is a symbol table used for in a compiler? a. It tracks the metaphors that are compiled b. It stores information about identifiers c. It attributes meaning to identifiers d. It represents the deep meaning of each identifier ------------------------------------------ b. is correct ------------------------------------------ FOR YOU TO DO Which kind of information is NOT something that a compiler should store in its symbol table. a. The type of an identifier b. The age of a symbol c. The number of hit movies that the symbol has made d. The deeper meaning of the symbol ------------------------------------------ b, c, and d are correct (only a. is the type of thing that a compiler should store) ------------------------------------------ FOR YOU TO DO Assume that there is a global variable tok, of type token with a field named typ of type token_type and that the type token_type is an enum that includes: identsym, definesym, numbersym, quotesym, eofsym, lparensym, rparensym and that there is a global function eat(token t), that: issues an error message if tok != t, and advances to the next token by executing tok = lexer_next(); and that there are functions parseNumber(), parseIdentExp(), and parseExp() that parse the nonterminals , , and Write the code in C for a recursive descent parser that recognizes the following EBNF grammar. (Note: you don't have to build ASTs for this.) ::=
{ } ::= | | ) ::= ( ::= quote | define | { } ------------------------------------------ void parseProgram() { parseForms(); eat(eofsym); } static bool isFormStart(token_type tt) { return tt == numbersym || tt == identsym || tt == lparensym; } void parseForms() { while (isFormStart(tok.typ)) { parseForm(); } } void parseForm() { switch (tok.typ) { case numbersym: parseNumber(); break; case identsym: parseIdentExp(); break; case lparensym: eat(lparensym); parseFormCont(); eat(rparensym); break; default: parse_error("expecting a number, identifier, or left paren."); break; } } void parseFormCont() { switch(tok.typ) { case quotesym: parseQuoteExp(); break; case definesym: parseDefineForm(); break; default: parseExp(); break; } } ------------------------------------------ FOR YOU TO DO What should be done to change a parser that just recognizes a language into one that returns ASTs? a. Send it to the Mayo Clinic for a liver test. b. Change the void return types to return the type AST*, build and return an AST in each parsing function. c. Change the void return types to return the type AST* and return NULL in each parsing function. d. Build the symbol table during the traversal and look for duplicate and missing declarations for idents. ------------------------------------------ b. is correct ------------------------------------------ FOR YOU TO DO What would a suitable abstract syntax be for the grammar: ::= { } ::= | | ) ::= ( ::= quote | define | { } What would a suitable AST type be for the above grammar? ------------------------------------------ ... P ::= { F } F ::= n | x | Q | D | A Q ::= quote x D ::= define x E A ::= app { E } E ::= ... where n in x in ... // file ast.h (in this directory also) // types of ASTs (type tags) typedef enum { program_ast, number_ast, ident_ast, quote_ast, define_ast, app_ast, /* ... maybe some others? ... */ } AST_type; // forward declaration, so can use the type AST* below typedef struct AST_s AST; // lists of ASTs typedef AST *AST_list; typedef struct { } program_t; // F ::= n typedef struct { number value; } number_t; // F ::= x typedef struct { const char *name; } ident_t; // Q ::= quote x typedef struct { const char *name; } quote_t; // D ::= define x E typedef struct { const char *name; AST * exp; } define_t; // ... // The actual AST definition: typedef struct AST_s { file_location file_loc; AST_list next; // for lists AST_type type_tag; union AST_u { program_t program; number_t number; ident_t ident; quote_t quote; define_t define; // ... } data; } AST; ------------------------------------------ FOR YOU TO DO Write, in C, a parser that returns ASTs for the above grammar ------------------------------------------ AST_list parseProgram() { AST_list forms = parseForms(); AST *first_form = ast_list_first(forms); eat(eofsym); return ast_program(first_form->file_loc, forms); } static bool isFormStart(token_type tt) { return tt == numbersym || tt == identsym || tt == lparensym; } AST_list parseForms() { AST_list ret = ast_list_singleton(parseForm()); AST_list last = ret; while (isFormStart(tok.typ)) { AST * fast = parseForm(); ast_list_splice(last, ast_list_singleton(fast)); last = ast_list_last_elem(last); } return ret; } AST *parseForm() { switch (tok.typ) { case numbersym: return parseNumber(); break; case identsym: return parseIdentExp(); break; case lparensym: eat(lparensym); AST * ret = parseFormCont(); eat(rparensym); return ret; break; default: parse_error("expecting a number, identifier, or left paren."); break; } // should never execute the following return NULL; } AST *parseFormCont() { switch(tok.typ) { case quotesym: return parseQuoteExp(); break; case definesym: return parseDefineForm(); break; default: return parseExp(); break; } }