**** problems that can occur in recursive-descent parsing (skip) ***** left-recursion (skip) ------------------------------------------ PROBLEM: LEFT RECURSION Consider: ::= + | Parsing function for : void parseExp() { parseExp(); eat(plussym); eat(numbersym); } Is that a problem? ------------------------------------------ ... yes, it's an infinite loop! ------------------------------------------ ELIMINATING LEFT RECURSION Technique: change to right recursion, using a new nonterminal E.g., change ::= + | to ------------------------------------------ ... ::= ::= + | or (using the meta-notation from EBNF): ::= { } ::= + Q: How do you know you haven't changed the language? Should check that this generates the same set of strings... unfortunately, this is an undecidable problem! However, when there are productions of the form ::= g | a where g and a stand for strings of terminals and non-terminals and a does not start with , then this derives strings of the form: a g* (as a regular expression), so we can write this RE as a regular grammar using right-recursion: ::= a ::= g | ***** left factoring (skip) ------------------------------------------ PROBLEM: AMBIGUOUS START OF RULES Example: ::= if then else | if then If we see an if token, which one to parse? SOLUTION: LEFT FACTORING Tactic: - Put common start in 1 nonterminal - Put differences in another nonterminal Example: Why is this better? ------------------------------------------ ... ::= if then ::= | else ... because parseME can be written to check the next token (e.g., if it's an else, then parse the next statement, otherwise done with it) **** case study: PL/0 expressions ------------------------------------------ CASE STUDY: PARSING PL/0 EXPRESSIONS Grammar for in PL/0: ::= { } ::= ::= | ::= { } ::= ::= |
::= | | | ( ) ::= | Examples in the language of : +3402 -3402 + 4020 3*4+2 3*4+(0+(1+1)) ------------------------------------------ Note: was missing from in the original grammar... Q: Why isn't the grammar like ::= ? First, that has left recursion, Second it's highly ambiguous. Q: How is 3*4+2 parsed? Like (3*4)+2, since multiplication is a ------------------------------------------ PARSE TREE WITH THIS GRAMMAR 3 * 4 + 2 ------------------------------------------ / \ / \ \ \ | / \ \ \ | | | | | 3 + | | | * 4 2 Q: Could we have made a different parse tree? No, there's no choices to make (Note: we are expanding { X } with as many copies as we need) Q: What are the tokens here? , , , , , ... Q: How do we start writing this? start with parseExpr()... ------------------------------------------ CODE FOR PARSING EXPRESSIONS // assume advance() and eat(), as before void parseExpr() { void parseSign() { ------------------------------------------ ... // parseExpr body: parseSign(); parseTerm(); parseAddSubTermList(); ... // parseSign body: switch (tok.typ) { case plussym: eat(plussym); break; case minussym: eat(minussym); break; }