# grammar, spaces and line breaks are important
# grammar declarations are read line by line, so just don't use line breaks for long rules and comments.

nonterminal E Expr
terminal ( ) + * - ==
typedterminal num Integer
topsym E  # top level grammar symbol

# precedence and associativity declarations, precedence determined by order
# These rules must appear before the production rules.

left * 500
left + 400
left - 400
left == 300
left ) 50
left ( 900
left num 980

# Grammar production rules, order breaks reduce-reduce conflicts
# spaces must separate grammar elements

E --> num:x { return new intexp(x); }
E --> E:e1 + E:e2 { return new sumexp(e1,e2); }
E --> E:e1 * E:e2 { return new multexp(e1,e2); }
E --> E:e1 - E:e2 { return new subexp(e1,e2); }
E --> E:e1 == E:e2 { return new eqexp(e1,e2); }
E --> - E:e { return new negexp(e); }
E --> ( E:e ) { return e; }

# ends file, whatever follows EOF is ignored:
EOF


In this version of the online calculator parser, we use the inherent
   ability of the LR state machine to resolve shift-reduce conflicts
   using external declarations.  This allows us to use ambiguous grammars.
   This feature is missing from "SableCC" but is available in Java_Cup, 
   which is just a Java port of the traditional YACC tool.

   Ambiguous grammars are most natural for humans. They also correspond 
   much more directly to the abstract syntax structures we wish to build.

   In other words, which grammar would you prefer working with for
   parsing arithemtic expressions?

1. ambiguous, SLR(1) with external declarations

   E --> E+E | E-E | E*E | E/E | num | (E)

   left-associative +, -, * , /
   precedence-order (*,/)  >  (+,-)

2. non-ambiguous SLR(1) (and thus LALR(1) and LR(1))

   E --> E+T | E-T | T
   T --> T*F | T/F | F
   F --> num | (E)

3. non-ambiguous LL(1)  

   E  --> T E1
   E1 --> "" | +T E1 | -T E1
   T  --> F T1
   T1 --> "" | *F T1 | /F T1
   F  --> num | (E)
   ** E1 and T1 are introduced for left-recursion elimination


   In terms of speed, there is no question that grammar 1 is the fastest,
   since there are less reductions to make.  But even more importantly,
   ask yourself: what kind of abstract syntax tree do you want to have for
   an expression such as 2+3*4?
  
   Why don't you sketch out the trees relative to the three grammars.  For
   grammar 1, it is:

             E
          /  |  \
         E   +   E
         |     / | \
         2    E  *  E
              |     |
              3     4

   For grammar 2, it is a bit less obvious:

             E
          /  |  \
         E   +   T
         |     / | \
         T    T  *  F
         |    |     |
         F    F     4
         |    |
         2    3

  You can also try to make a tree for grammar #3 but I will not give
  extra credit :).  The point is not just that the ambiguous grammar is 
  more convenient, but also that it more directly correspond to how we 
  want to represent the expression *in the abstract*.  In such a role 
  grammar symbols such as T, F (not to mention E1, T1) are simply superflous.
  For the LL(1) grammar, even the simplest expression "3" will have quite 
  a tree:

          E
       /     \
      T       E1
    /   \     |
   F     T1   ""
   |     |
   3     ""

   What role does T1 and E1 play?  Why would I want such entities in my
   abstract syntax tree?  At the very least I will have to walk over the
   tree to "clean it up".  What's more, in actual use we usually want
   to associate some meaning and action relative to each non-terminal. That
   is, each non-terminal symbol should represent some valid program 
   component: each E represents a valid arithemtic sub-expression.  
   With the unambiguous LR grammar, at least each non-terminal still 
   corresponds to a valid subexpression.  But what kind of expression does 
   "* F T1" represent?  Clearly "* 3" is not a well-formed expression or
   subexpression.  What kind of meaning can I associate with just "* 3"?  
   Is it dereferencing a pointer?  What kind of animal is this?