/* In this version of the online calculator parser, we use the inherent ability of the LR state machine to resolve shift-reduce conflicts using external declarations. This allows us to use ambiguous grammars. This feature is missing from "SableCC" but is available in Java_Cup, which is just a Java port of the traditional YACC tool. Ambiguous grammars are most natural for humans. They also correspond much more directly to the abstract syntax structures we wish to build. In other words, which grammar would you prefer working with for parsing arithemtic expressions? 1. ambiguous LALR(1) with external declarations E --> E+E | E-E | E*E | E/E | num | (E) left-associative +, -, * , / precedence-order (*,/) > (+,-) 2. non-ambiguous LALR(1) E --> E+T | E-T | T T --> T*F | T/F | F F --> num | (E) 3. non-ambiguous LL(1) E --> T E1 E1 --> "" | +T E1 | -T E1 T --> F T1 T1 --> "" | *F T1 | /F T1 F --> num | (E) ** E1 and T1 are introduced for left-recursion elimination In terms of speed, there is no question that grammar 1 is the fastest, since there are less reductions to make. But even more importantly, ask yourself: what kind of abstract syntax tree do you want to have for an expression such as 2+3*4? Why don't you sketch out the trees relative to the three grammars. For grammar 1, it is: E / | \ E + E | / | \ 2 E * E | | 3 4 For grammar 2, it is a bit less obvious: E / | \ E + T | / | \ T T * F | | | F F 4 | | 2 3 You can also try to make a tree for grammar #3 but I will not give extra credit :). The point is not just that the ambiguous grammar is more convenient, but also that it more directly correspond to how we want to represent the expression *in the abstract*. In such a role grammar symbols such as T, F (not to mention E1, T1) are simply superflous. For the LL(1) grammar, even the simplest expression "3" will have quite a tree: E / \ T E1 / \ | F T1 "" | | 3 "" What role does T1 and E1 play? Why would I want such entities in my abstract syntax tree? At the very least I will have to walk over the tree to "clean it up". What's more, in actual use we usually want to associate some meaning and action relative to each non-terminal. That is, each non-terminal symbol should represent some valid program component: each E represents a valid arithemtic sub-expression. But What kind of expression does "* F T1" represent? Clearly "* 3" is not a well-formed expression or subexpression. What kind of meaning can I associate with just "* 3"? What kind of animal is this? */ import java_cup.runtime.*; import java.io.*; parser code {: String intext; // these vars must be declared here and instantiated below: Yylex scanner; :}; init with {: try { BufferedReader lineinput = new BufferedReader(new InputStreamReader(System.in)); intext = lineinput.readLine(); } catch (Exception ee) {} scanner = new Yylex(new StringReader(intext)); :}; scan with {: return scanner.next_token(); :}; /* grammar symbols and associated semantic value */ terminal PLUS, MINUS, TIMES, DIVIDE, LPAREN, RPAREN, DOT; terminal Double NUM; non terminal Double Line; non terminal Double Ex; non terminal Double Tx; non terminal Double Fx; /* Operator precedence and associativity declarations: The last line have higher precedence than earliear lines */ precedence left PLUS, MINUS; precedence left TIMES, DIVIDE; /* grammar production rules and semantic actions */ Line ::= Ex:v1 {: RESULT = v1; System.out.println("Result = "+v1); :}; Ex ::= Ex:v1 PLUS Ex:v2 {: RESULT = (double)v1 + (double)v2; :} | Ex:v1 MINUS Ex:v2 {: RESULT = (double)v1 - (double)v2; :} | Ex:v1 TIMES Ex:v2 {: RESULT = (double)v1 * (double)v2; :} | Ex:v1 DIVIDE Fx:v2 {: if ((double)v2 == 0) throw new Error("watch division by zero you bonehead!"); else RESULT = (double)v1 / (double)v2; :} | LPAREN Ex:v1 RPAREN {: RESULT = v1; :} | NUM:v1 {: RESULT = v1; :};