Assignment 2a : MiniJava+ Grammar and parser. 0. Go back to the minijava.lex file that you created for the lexical tokenizer. Modify it in the following ways: a. Add a token for "==", to be designated sym.EQEQ (make sure you use EQEQ when referring to it in your .cup file). b. Add a token for "Clib", to be designated sym.CLIB Remember to do java -cp /shared/csc/local JLex.Main yourfile.lex 1. Your compiler will consist of many files, and sometimes the easiest way to share information will be to create a globally accessible structure. Write a file Global.java with the following code: public class Global { public static String sourcefile; public static String targetfile; public static boolean trace = false; public static int line = 0; public static int charnum = 0; } For example, you probably want to insert print statements in your program for the purpose of tracing and debugging. You can write them as: if (Global.trace) System.out.println(...); Then to turn tracing on/off, instead of editing each line and file, you just need to change the value of Global.trace. 2. On page 485 of the text book, you'll find the "BNF" version of the MiniJava grammar. There's also a similar but less detailed grammar on the textbook website. You should use this grammar as a starting point, but several parts of it will require you to think very carefully and experiment by trial and error. You'll also have to add the MiniJava+ extensions to the grammar (see casn2.txt for details). For this assignment, you just need to write a grammar without any semantic action, except print statements for tracing. I'll start you off with the following front portion of my .cup file: import java.util.*; import java.io.*; import java_cup.runtime.*; parser code {: Yylex scanner; :}; init with {: try { scanner = new Yylex(new FileReader(Global.sourcefile)); } catch (Exception eee) {System.out.println(eee);} :}; scan with {: return scanner.next_token(); :}; /* grammer symbols: */ terminal String ID; terminal Integer INTLIT; terminal String STRINGLIT; terminal DOUBLELIT; terminal PLUS, TIMES, MINUS, SLASH, LPAREN, RPAREN, TRUE, FALSE; terminal LBRACK, RBRACK, LBRACE, RBRACE, THIS, NEW, UMINUS; terminal CLASS, PUBLIC, STATIC, VOID, MAIN, STRINGtok; terminal EXTENDS, RETURN, INTtok, BOOLEANtok, PRINTLN; terminal AND, OR, NOT, IF, PRINTF, EQUAL, DOT, SEMI, COMMA; terminal WHILE, LENGTH, ELSE, LESSTHAN, DOUBLEtok, EQEQ; terminal CLIB; non terminal PROGRAM; non terminal Statement; non terminal Expression; etc... ........... PROGRAM ::= ... ... Process the grammar with java -cp .:/shared/csc/local java_cup.Main -expect 4 < yourfile.cup The "-expect 4" means you are expecting up to 4 parser conflicts. For your assignment: YOU ARE NOT ALLOWED ANY REDUCE-REDUCE CONFLICTS Your are allowed a maximum of 4 shift-reduce conflicts. Cup will resolve these conflicts in favor of shifting, but YOU MUST PROVIDE JUSTIFICATION FOR WHY THIS IS THE CORRECT ACTION IN EACH CASE. One of these cases will most probably be between if and if-else because of the dangling-else problem. In my grammar, this is the only conflict. If you got any others, you probably did it wrong and your parser won't work properly. See notes below for additional, important hints. When finished, you can test it with the parsertest.java program on the web page (study it to understand what it's doing). Make sure you test the parser on at least Factorial.java and BinarySearch.Java as examples. When you get an error message that says something like "cannot parse past char 56", the number 56 is really the line number, not the char number. You'll probably need to insert finer tracing print statements to track down the error further. -------- Additional Hints on Creating the Grammar --------- I am sharing with you some experiences I've had while constructing my own parser. 1. Unlike what the BNF grammar suggests, do NOT distinguish variable declarations (VarDecl) with other kinds of statements (Statement). In MiniJava, all variable declarations inside a method body must preceed other statements (like in old C), however, this rule can also be enforced at a later stage, when we examine the abstract syntax structure. If you just enter the BNF rules without enough thought, you'll get shift-reduce conflicts and the default resolution in favor of shift will be WRONG! This is because if a method body contains no variable declarations, then there should be an epsilon-reduction to VarDecl, which means you should reduce, not shift. It is better to just allow a method body to contain an arbitrary number of statements. 2. The minijava grammar in the text also treats "return" statements in a special way. That is, the return statement must always be the last statement in a method (except main). However, I suggest that you make return statements just another kind of statement. Again, if we want to enforce that return can only be called at the end, we can do it later, after we've built the syntax tree. 3. You cannot use notation like Statement* in Java Cup. To do the same job, you need two or more non-terminal symbols: Statement -> if ... | while ... ... | LBRACE Stats RBRACE; Stats -> (epsilon) | Stats Statement; You can also write (Statement Stats) as the last production but I recommend the left-recursive version. This is because, in the next part of the assignment you will be associating an ArrayList structure with each Stats symbol. When you add a new element to an ArrayList, it will be added to the end, not the front. If your grammar is not written in a correct way, you'll end up building the ArrayList in reverse order. It is OK, as long as you use it consistently. Sometimes you'll need three symbols. For example, A list of parameters can be epsilon, a single expression, or a list of comma-separated expressions with no comma after the final expression. This can be parsed using the following grammar: ExpList -> epsilon | Expression ExpRest ExpRest -> epsilon | COMMA Expression ExpRest Think carefully and make sure you understand why there are two epsilon productions (under what situations are they useful). You need to be able to think in such details in order to build the parser.