LANGUAGES

def: A *language* is


          LANGUAGE CLASSES

Languages can be classified by


Venn Diagram:

 |--------------------------------------|
 | Regular Languages                    |
 |                                      |
 |                                      |
 |  |--------------------------------|  |
 |  | Contex-free Languages          |  |
 |  |                                |  |
 |  |                                |  |
 |  |  |--------------------------|  |  |
 |  |  | Context-sensitive        |  |  |
 |  |  |   Languages              |  |  |
 |  |  |                          |  |  |
 |  |  |                          |  |  |
 |  |  |  |--------------------|  |  |  |
 |  |  |  |  Type 0 Languages  |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |--------------------|  |  |  |
 |  |  |                          |  |  |
 |  |  |--------------------------|  |  |
 |  |                                |  |
 |  |--------------------------------|  |
 |                                      |
 |--------------------------------------|


          PHASES OF A COMPILER

Programs allowed by a compiler's:

 |--------------------------------------|
 | Lexical Analysis (Lexer)             |
 |                                      |
 |                                      |
 |  |--------------------------------|  |
 |  | Parser                         |  |
 |  |                                |  |
 |  |                                |  |
 |  |  |--------------------------|  |  |
 |  |  | Static Analysis          |  |  |
 |  |  |                          |  |  |
 |  |  |                          |  |  |
 |  |  |                          |  |  |
 |  |  |  |--------------------|  |  |  |
 |  |  |  |  Runtime checks    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |                    |  |  |  |
 |  |  |  |--------------------|  |  |  |
 |  |  |                          |  |  |
 |  |  |--------------------------|  |  |
 |  |                                |  |
 |  |--------------------------------|  |
 |                                      |
 |--------------------------------------|


       PROBLEM WRITING SYNTAX ANALYSIS

Naive way to write a compiler, etc.

Language
 Def.docx  -- coding --> parser.c (v. 1)

 Def2.docx -- coding --> parser.c (v. 2)
           -- coding --> parser.c (v. 3)

 Def3.docx -- coding --> parser.c (v. 4)
 Def4.docx -- coding --> parser.c (v. 5)
   ...                    ...
 DefN.docx -- coding --> parser.c (v. N)

Disadvantages:


 COMPUTER SCIENCE SOLUTION: AUTOMATION

 high-level
 description  tool      generated code

   lang.y  -- bison --> lang.tab.c
                        + lang.tab.h
   lang2.y -- bison --> lang.tab.c
                        + lang.tab.h
              ...
   langN.y -- bison --> lang.tab.c
                        + lang.tab.h

Advantages:


        GRAMMARS DESCRIBE LANGUAGES

Grammars are high-level descriptions
 of languages/parsers

def: a *grammar* consists of a finite set
     of rules (called "productions")
     and a start symbol (a nonterminal).

     Let V = nonterminals + terminals
     
     The rules have the form

            V+ -> V*

     where there is no symbol in both
        nonterminals and terminals


def: The language generated by a grammar G
     with set of productions P is:

     {w | w is in terminals* and S =>* w}
       where S is the start symbol of G
       gAd => gBd iff g in V* and d in V*
       and A -> B is a rule in P
       and g =>* h iff either h = g
                    or g -> i and i =>* h


         BNF NOTATION FOR GRAMMARS

::=  means

 |   means

<X>  is a


Example

 <BNumber> ::= <BDigit> <BNumber>
           |  <BDigit>

 <BDigit> ::= 0 | 1


         GRAMMARS AS RULES OF GAMES

A grammar can be seen as describing
  two games:

   - A production game
      (Can you produce this string?)
   - A recognition/parsing game
      (Is this string in the language?)
      

            PRODUCTION GAME

Goal: produce a string in the language
        from the start symbol

Example Grammar:
      <Sentence> -> <Noun> <Verb> <Adj>
      <Noun> -> Johnny | Sue | Charlie
      <Verb> -> is | can be
      <Adj> -> good | difficult

   Can we produce "Johnny is good"?


        RECOGNITION OR PARSING GAME

Goal: determine if a string is in
      the language of the grammar

Example Grammar:
      <Sentence> -> <Noun> <Verb> <Adj>
      <Noun> -> Johnny | Sue | Charlie
      <Verb> -> is | can be
      <Adj> -> good | difficult

   Is "Johnny is good" in this grammar?


         DERIVATION (OR PARSE) TREES

def: a *tree* is a finite set of nodes
     connected by directed edges
     that is connected and has no cycles

def: a *derivation tree* for grammar G
     is a tree such that:
        - Every node has a label that is
          a symbol of G
        - The root is labeled by
          the start symbol of G
        - Every node with a
          direct descendent, is labeled by
          a nonterminal
        - If the descendents of a node
          labeled by N have the following
          labels (in order):
             A, B, C, ..., K
          then G has a production of form
             N -> A B C ... K

        EXAMPLE DERIVATION TREE

Example Grammar:
      <Sentence> -> <Noun> <Verb> <Adj>
      <Noun> -> Johnny | Sue | Charlie
      <Verb> -> is | can be
      <Adj> -> good | difficult

String to parse: "Johnny is good"

         <Sentence>
         /    |   \
        /     |    \
       v      v     v
    <Noun> <Verb> <Adj>


    Johnny   is    good

        EXTENSIONS TO BNF (EBNF)

Arbitrary number of repeats:

    { x }  means 0 or more repeats of x

   <N> :: = { <Z> }

   is equivalent to:

   <N> ::= <Z-seq>
   <Z-seq> ::= <empty> | <Z-Seq> <Z>
   <empty> ::=

   {<Z>} is also written as:
       <Z>*  or  [ <Z> ] ...

One-or-more repeats:

     x+  means 1 or more repeats of x

   <N> :: = <X>+

   is equivalent to:

   <N> ::= <Xs>
   <Xs> ::= <X> | <Xs> <X>

   <X>+ is sometimes written as <X> ...
                   or <X> [ <X> ] ...

Optional element:

    [ x ]  means 0 or 1 occurences of x

     <N> ::= [ <Y> ]

     is equivalent to:

     <N> ::= <Y-opt>
     <Y-opt> ::= <empty> | <Y>

    READING A BNF GRAMMAR

Example rules:

  <DecimalConstant> ::=
	<NonZeroDigit> <Digits>
  <NonZeroDigit> ::= 1 | 2 | 3 | 4 | 5
                    | 6 | 7 | 8 | 9
  <Digit> ::= 0 | <NonZeroDigit>
  <Digits> ::= <Digit> | <Digit> <Digits>


      EBNF GRAMMAR FOR (SUBSET OF) PL/0

<program> ::= <block> .
<block> ::= <const-decls>
            <var-decls>
            <proc-decls>
            <stmt>
<const-decls> ::= {<const-decl>}
<const-decl> ::= const <const-def>
                   {<comma-const-def>} ;
<const-def> ::= <ident> = <number>
<comma-const-def> ::= , <const-def>
<var-decls> ::= {<var-decl>}
<var-decl> ::= var <idents> ;
<idents> ::= <ident> {<comma-ident>}
<comma-ident> ::= , <ident>

<proc-decls> ::= {<proc-decl>}
<proc-decl> ::= procedure <ident> ; <block> ;
<stmt> ::= <ident> := <expr>
       | call <ident>
       | begin <stmt> {<semi-stmt>} end
       | if <condition> then <stmt> <else-opt>
       | while <condition> do <stmt>
       | read <ident>
       | write <ident>
       | skip
<semi-stmt> ::= ; <stmt>
<else-opt> ::= <empty> | else <stmt>
<empty> ::=
<condition> ::= odd <expr>
           | <expr> <rel-op> <expr>


          EXAMPLES IN PL/0

Shortest program:
  skip.

Factorial program:

  var n, res; # input and result
  procedure fact;
  begin
     read n;
     res := 1;
     while (n <> 0)
       begin
          res := res * n;
          n := n-1
       end;
     write res
  end;
  call fact.


           MOTIVATION

 # $Id$\n   .text start\nstart:\tADDI ...

Want to:


Approach:


            LEXICAL ANALYSIS

Lexical means relating to the words
  of a language


       GOALS OF LEXICAL ANALYSIS

- Simplify the parser,
  so it need not handle:


- Recognize the longest match
  Why?


- Handle every possible character of input
  Why?


         CONFLICT BETWEEN RULES

Suppose that both "if" and numbers
 are tokens:

What tokens should "if8" match?


Fixing such situations:


           WHICH TOKEN TO RETURN?

If the input is "<=", what token(s)?


If the input is "<8", what token(s)?


If the input is "if", what token(s)?


If the input is "//", what token(s)?

Summary:


          THE BIG PICTURE

                       tokens
 - source --> [ Lexer] ------> [ Parser]
    code                          /
                                 / 
                                / abstract
                               /  syntax
                              v   trees
                            [ static
                              analysis ]
                             / 
                            /
                           v
                         [ code generator ]

For the Lexer we want to:
  - specify the tokens
    using regular expressions (REs)
  - convert REs to DFAs to execute them
     but easy conversions are:
        - REs to NFAs
        - NFAs to DFAs


        HOW PARSER WORKS WITH LEXER

Couroutine structure:

   Parser calls lexer:
   
          tok = yytoken(); // call lexer
           /* ... use yylavl ... */

   Lexer function
       remembers pointer to input stream

         returns next token (int code)

   Parser works...
   
   Parser calls lexer again:
   
          tok = yytoken(); // call lexer
           /* ... use yylavl ... */
     

 BISON AND FLEX, GENERATING A PARSER

idea:

   ast.h (AST types)
    |
    |        bison
    |        -----> g.tab.c
    |       /        ^  yyparse function
    v      / bison   |  
    g.y file -----> g.tab.h  
                     |  tokens
              flex   v  defs. 
    g.l file -----> g.c       
                      yytoken function


             DEFINITIONS

The lexical grammar of a language
   is regular, because


def: a grammar is *regular* iff
     all its productions have one of these
     forms:

           <B> ::= c <D>
         or
           <B> ::= c

     where c is a terminal symbol,
       and <B> and <D> are nonterminals

def: a language is *regular* iff
     it can be defined using a
     regular grammar.

Thm: Every regular language can be
     recognized by a finite automaton.

Thm: Every regular language can be
     specified by a regular expression.

            REGULAR EXPRESSIONS

The language of regular expressions:

  <RE> ::= <char>
       | <RE> '|' <RE>
       | <RE> <RE>
       | emp            
       | <RE>*
       | ( <RE> )

 where <char> is a character

Examples:

  RE              meaning
====================================
  emp             the empty string

  (0|1)*0         even binary numerals

 b*(abb*)*(a|emp) a's and b's without
                  consecutive a's
  

     EXTENSIONS TO REGULAR EXPRESSIONS

[abcd]   means   a|b|c|d

[h-m]    means   h|i|j|k|l|m

 x?      means   x|emp

 y+      means   y(y*)


           FOR YOU TO DO

Write a regular expression that describes:

1. The keyword "if"


2. The set of all (positive)
   decimal numbers


3. The set of all possible identifiers
   with underbars (_)


(If you have time, write these as
 regular grammars also.)


   EXAMPLE REGULAR EXPRESSIONS FOR PL/0

PATTERN         RE
=========================
DECDIGIT	[0-9]
LETTER		[_a-zA-Z]


...

              EXAMPLE

State diagram:

             <         =
    -->[q0] --->[[q1]]---> [[q2]]
                  |    
                  | >
                  v
                [[q3]]


      PSEUDO CODE FOR THIS LEXER CASE

    char c;


        DEALING WITH COMMENTS

Suppose
  /  is used for division
  // starts a comment to end of line
      (unlike PL/0!)

What state diagram?


How does whitespace fit in?


TACTICS FOR IGNORING WHITESPACE, COMMENTS

Goal: do not send ignored tokens to parser

Can always get a non-ignored token:


Return "tokens" that include ignored stuff
  to a loop that ignores them


Giant DFA that goes back to start state
  on seeing something to ignore


     NONDETERMINISTIC FINITE AUTOMATA

def: A *nondeterministic finite automaton*
    (NFA) over an alphabet Sigma
    is a system (K, Sigma, delta, q0, F)
    where K is a finite set (of states),
          Sigma is a finite set set
            (the input alphabet),
          delta: is a map of type
             (K, Sigma) -> Sets(K)
          q0 in K is the initial state, &
          F is a subset of K
            (the final/accepting states).


     TRANSITION FUNCTION AND ACCEPTANCE

  p in delta(q,x)
     means that in state q,
       on input x the next state
       can be p

  p in delta*(q,s)
     where s in Sigma*
      is defined by:
              delta*(q,emp) = {q}
         (i.e., q in delta*(q,emp))
         p in delta*(q,xa) =
                delta*(q2, a)
            where x in Sigma,
                  a in Sigma*,
                  q2 in delta(q,x)

Lemma: for all c in Sigma,
   p in delta*(q, c) = delta(q, c)

def: An NFA (K, Sigma, delta, q0, F)
     *accepts* a string s in Sigma* iff
     there is some q in delta*(q0,s)
                 such that q in F


                  EXAMPLE NFA

         0,1                     0,1
        /---\                   /---\
        \   /                   \   /
         | v   0           0     | v
    -->[ q0 ] ---> [ q3 ] ---> [[ q4 ]]
         |
         | 1
         v
       [ q1 ]
         |
         | 1
         v
      [[ q2 ]]
         | ^
        /   \
        \---/
         0,1

 K = {q0,q1,q2,q3,q4}
 Sigma = {0,1}
 q0 is start state
 F = {q2,q4}

 delta(q0,0) = {q0,q3}
 delta(q1,0) = {}
 delta(q2,0) = {q2}
 delta(q3,0) = {q4}
 delta(q4,0) = {q4}
 delta(q0,1) = {q0,q1}
 delta(q1,1) = {q2}
 delta(q2,1) = {q2}
 delta(q3,1) = {}
 delta(q4,1) = {q4}


   EXTENDING FUNCTIONS TO SETS OF STATES

Notation extending delta and delta* to
   sets of states
     d(Q,x) = union of d(q,x)
                for all q in Q

     so d({},x) = {}
        d{{q},x) = d(q,x)
        d({q1,q2},x) = d(q1,x)+d(q2,x)
        d({q1,q2,q3},x) =
               d(q1,x)+d(q2,x)+d(q3,x)
        etc.


Note that delta*({q}, x)
       = delta*(q, x)
       = delta(q, x)


             EXAMPLE

delta*(q0,010011)
    = delta*(delta(q0,0),10011)
    = delta*({q0,q3},10011)
    = delta*(q0,10011)
      + delta*(q3,10011)
    = delta*(delta(q0,1),0011)
      + delta*(delta(q3,1),0011)
    = delta*({q0,q1},0011)
      + delta*({},0011)
    = delta*(delta(q0,0),011)
         + delta*(delta(q1,0),011)
      + {}
    = delta*({q0,q3},011)
         + delta*({},011)
      + delta*({q2},011)
    = delta*({q0,q3},011)
         + {}
      + delta*({q2},011)
    = delta*({q0,q3},011)
      + delta*({q2},011)
    = delta*(delta(q0,0),11)
         + delta(q3,0),11)
      + delta*(delta(q2,0),11)
    = delta*({q0,q3},11)
         + delta*({q4},11)
      + delta*({q2},11)
    = delta*({q0,q3,q4},1)
         + delta*({q4},1)
      + delta*({q2},1)
    = delta({q0,q1,q4},1)
         + delta(q4,1)
      + delta(q2,1)
    = {q0,q1,q2,q4}
         + {q4}
      + {q2}
    = {q0,q1,q2,q4}


     DETERMINISTIC FINITE AUTOMATA

def: a *deterministic finite automaton*
     (DFA) is an NFA in which
     delta(q,c) is a singleton or empty
        for all q in K and c in Sigma.


          IMPLEMENTING DFAS
          
How would you represent states?


How would you implement a DFA?


           PROBLEM

We want to specify lexical grammar
   using

So we need to convert regular expressions
   into DFAs 


         CONVERTING REs TO NFAs

Definition based on grammar
   of Regular Expressions:

Result of Convert(M) looks like
 this:  -->(M q)
    where the "tail", -->,
                 goes to the start state
          and q is the "head state"
 assume also Convert(N) is --->(N q')

                 c
  Convert(c) =  --->[ q ]

  Convert(M | N) =
                 /--->(M q)--\
       emp      /emp          \
      ---->[ q ]               -> [ q2 ]
                \emp          /
                 \--->(N q')-/

  Convert(M N) =

           --> (M q)-->(N q')

                  emp
  Convert(emp) = -----> [ q ]

                       emp
                   /--------------\
                  /         emp   v
  Convert(M*) = -/ /->(M q)---->[ q2 ]
                  /              /
                  \   emp       /
                   \-----------/

  Convert((M)) = Convert(M)

  After conversion, make the "head state"
   be a final state

        EXAMPLE OF CONVERSION TO NFA

Regular expression:  (i|j)*

               i
 Convert(i) = --->[ qi ]

               j
 Convert(j) = --->[ qj ]

 Convert(i|j) = 
                   i
                 /--->(qi)---\ emp
       emp      /             \
      ---->[ q ]               -> [ q2 ]
                \  j          /
                 \--->(qj)---/ emp

 Convert((i|j)*) =


                       emp
                   /--------------\
                  /         emp   v
Convert((i|j)*)=-/ /->(i|j)---->[ q2 ]
                  /              /
                  \   emp       /
                   \-----------/


       CONVERTING AN NFA INTO A DFA

Idea:
  Convert each reachable set of NFA states
  into into a single state of the DFA


How?

  Use the emp-closure of each state q
     = set of states reachable from q
       using emp

 Closure wrt emp:
  closure(S) is the smallest set T
  such that
     T = S +
        union {delta(s, emp) | s in T}
  
  can compute closure(S) as

     T <- S;
     do T2 <- T
        T <- T2 +
           union {delta(s, emp)| s in T2}
     while (T != T2)

 DFA Transitions:
  Let S be a set of states, then
    DFAdelta(S, c) =
       closure(union {delta(s,c)|s in S})

  
     EXAMPLE CONVERSION OF NFA TO DFA

NFA for if|[a-z]([a-z]|[0-9])*

                 f
           [q2] ---> [[q3]]
            ^                    emp
         i /                   /------\
          / emp       a-z     /       v
   -->[q1]------>[q4]----->[q5]    [[q8]]
                                     ^ |
                                  emp| |
                                     | |
                                a-z  | |
                             /-----\ | |
                            /      v / |
                        |->[q6]   [q7] |
                        |   \  0-9 ^   |
                    emp |    \----/    /
                        |             / 
                        \    emp     /
                         \----------/


Converted to DFA:

                   f
        [2,5,6,8] ---> [3,6,7,8]
          ^               \
      i  /                 \ a-z
        / a-h j-z      a-z  \ 0-9  
  -->[1,4]---->[5,6,8] 0-9   v    /-|
                    \----->[6,7,8]  |a-z
                                ^   |0-9
                                 \  |
                                  \-|


  USING THE FLEX TOOL TO GENERATE LEXERS

Example: SRM assembler

   High-level description in
       asm_lexer.l

   Generated lexer asm_lexer.c
                   + asm_lexer.h

   Wrapper for lexer:
        lexer.h
           declares functions
        lexer.c
           does nothing
        asm_lexer.l
           defines functions
             e.g., lexer_print_token

   ASTs defined in ast.h

   asm.y is Bison description file grammar

   == bison ==>

   - Declarations in asm.tab.h
        includes ast.h
                 machine_types.h
                 parser_types.h
                     declares YYSTYPE
                 lexer.h
        declares
                 yytokentype
                   eolsym = ...
                   minussym = ...
                   dottextsym = ...
                   ...
                   
   - Definitions in asm.tab.c
        defines yyparser()
                YYSTYPE yylval;


      STRUCTURE OF FLEX INPUT FILE

  /* ... definitions section ... */

%%
  /* ... rules section ... */
%%

  /* ... user subroutines ... */


    SECTIONS IN FLEX INPUT (.l file)

Definitions section:


Rules section:


User subroutine section:


        WHY CONTEXT-FREE PARSING?

Can we define a regular expression
   to make sure expressions have
   balanced parentheses?
   e.g., recognize: (34) and ((12)+(789))
         but not: (567))+82))))


          WHAT IS NEEDED

Needed for checking balanced parentheses:


           PARSING

Want to recognize language
   with


Goal:


             CONTEXT-FREE GRAMMARS
             
def: a *context-free* grammar (CFG)
     (N, T, P, S) has 
     start symbol S in N and each
     production (in P) has the form:
        <M> -> g
     where <M> is a Nonterminal symbol,
        and g in (N+T)*

Example:


def: For a CFG (N,T,P,S),
     g in (N+T)* *produces* g' in (N+T)*,
     written g =P=> g',
     iff g is e <X> f,
         g' is e h f,
         <X> is a nonterminal (in N),
         h in (N+T)*, and the rule
         <X> -> h is in P

Example:
       ( <Number> ) =P=> 


              DERIVATION     

def: a *derivation* of a terminal string t
     from the rules P of a CFG (N,T,P,S),
     is a sequence (S, g1, g2, ..., gm),
     where gm = t, and
           for all i: gi in (N+T)*, and
           for all 0 <= j < m:
                 gj =P=> g(j+1)

Example:


      LEFTMOST DERIVATION
     
def: a *leftmost derivation* of a string
     t in T* from a CFG (N,T,P,S) is
     a derivation of t from (N,T,P,S)
     (S, g1, ..., gm) such that gm = t and
        for all 0 <= j < m:
            when gj =P=> g(j+1)
            and the nonterminal <X> is
             replaced in gj,
              then there are no
               nonterminals to the left of
               <X> in gj.

Example:


         PARSE TREES AND DERIVATIONS

def: A *parse tree*, Tr, for a CFG
     (N,T,P,S) represents a derivation, D,
     iff:
       - Each node in Tr is labeled by
         a nonterminal (in N)
       - the root of Tr is
         the start symbol S
       - an arc from <N0> to h in (N+T)
          iff <N0> -> ... h ... in P
       - the order of children of a node
         labeled <N0> is the order in
         a production <N0> -> ... in P
         

             EXAMPLES

GRAMMAR:

  <expr> ::= <number> | <expr> + <expr>
          |  <expr> * <expr>

Derivations of 3*4+2


         EXAMPLE PARSE TREES

Corresponding to leftmost derivation:


Corresponding to rightmost derivation:


             AMBIGUITY

def: a CFG (N,T,P,S) is *ambiguous* iff
     there is some t in T* such that
     there are two different parse tress
     for t


          FIXING AMBIGUOUS GRAMMARS

Idea: Rewrite grammar to


Example:


    RECURSIVE DESCENT PARSING ALGORITHM

For each production rule, of form:
   <N> ::= g1 | g2 | ... | gm

 1. Write a 


 2. This function 


      EXAMPLE RECURSIVE-DESCENT PARSER

<Stmt> ::= if <Cond>
           then <Stmt>
           else <Stmt>
        | begin S <List> end
        | write <number>
<List> ::= ; <Stmt> <List> | <Empty>
<Empty> ::=
<Cond> ::= <number> = <number>


 token tok = lexer_next();

 void advance() { tok = yytoken(); }

 void eat(token_type tt) {
    if (tok.typ == tt) {
        advance();
    } else { /* ... report error */ }
 }

 void parseCond()
 {     eat(numbersym);
       eat(eqsym);
       eat(numbersym);
 }

 void parseStmt()
 {    switch (tok.typ) {
      case ifsym:
         eat(ifsym);
         parseExp();
         eat(thensym);
         parseStmt();
         eat(elsesym);
         parsetStmt();
         break;
      case beginsym:
         eat(beginsym);
         parseStmt();
         parseList();
         break;
      case writesym:
         eat(writesym);
         eat(numbersym);
         break;
      default:
         // report error
      }
 }


           LL(1) GRAMMARS

A recursive-descent parser must:

  - choose between alternatives
    (e.g., <N0> ::= <A> | <B>)

def: A grammar is *LL(1)* iff


           LR(1) GRAMMARS

An LR(1) parser needs to decide
  when to: shift (push token on stack) or
           reduce

  uses a DFA based on stack + lookhead

def: A grammar is *LR(1)* iff


          LALR(1) Parsing

Smaller tables than LR(1)
  - merges states of the DFA
      if only differ in lookahead

            PROBLEM: AMBIGUITY

Consider:

<S> ::= <ident> := <number>
     | if <E> then <S> <ME>
<ME> ::= <empty> | else <S>

and the statement:

   if b1 then 
      if b2 then x := 2
   else x := 3

Is this parsed as:

       <S>
   / /  \ -----|---\
  if <E> then  |  <ME>
      |        |      \
      |        |    <empty>
      |       <S>     
             / | \--------\
      |     /  |  \    \   \
      b1   if <E> then <S> <ME>
                      / | \ \   \    
                       ...   |   \
                      x := 2 else <S>
                                 / | \
                                  ...
                                 x := 3
or as:

       <S>
   /  /   \----|--------\
  if <E> then <S>      <ME> 
      |        |\     /   \       
      b1      /| \   else <S> 
             / |  \      / | \    
             |  |  |      ...
             |  |  |     x := 3
             |  |  |
            if <E> then <S> <ME>
                |       /|\   \
                b2      ...   <empty>
                       x := 2

        FIXES FOR AMBIGUITY

Change the language:

 a. Always have an else clause:

   <S> ::= if <E> then <S> else <S>

   (use skip if don't want to do anything)

 b. Use an end marker

   <S> ::= if <E> then <S> else <S> fi
        |  if <E> then <S> fi

Give precedence to one production:

   <S> ::= if <E> then <S> <ME>
   <ME> ::= else <S>  // priority!
         | <empty>

So we only get the parse tree:

       <S>
   / /  \ -----|---\
  if <E> then  |  <ME>
      |        |      \
      |        |    <empty>
      |       <S>     
             / | \--------\
      |     /  |  \    \   \
      b1   if <E> then <S> <ME>
                      / | \ \   \    
                       ...   |   \
                      x := 2 else <S>
                                 / | \
                                  ...
                                 x := 3