Assignment 2a : MiniJava+ Grammar and parser.


0.  Go back to the minijava.lex file that you created for the lexical
    tokenizer.  Modify it in the following ways:

    a.  Add a token for "==", to be designated sym.EQEQ (make sure you
        use EQEQ when referring to it in your .cup file).

    b.  Add a token for "Clib", to be designated sym.CLIB

Remember to do java -cp /shared/csc/local JLex.Main yourfile.lex


1.  Your compiler will consist of many files, and sometimes the easiest
    way to share information will be to create a globally accessible
    structure.  Write a file Global.java with the following code:
     
     public class Global
     {
         public static String sourcefile;
	 public static String targetfile;
	 public static boolean trace = false;
         public static int line = 0;
         public static int charnum = 0;
     }

For example, you probably want to insert print statements in your program for
the purpose of tracing and debugging.  You can write them as:

   if (Global.trace) System.out.println(...);

Then to turn tracing on/off, instead of editing each line and file, you
just need to change the value of Global.trace.


2. On page 485 of the text book, you'll find the "BNF" version of the MiniJava
   grammar.  There's also a similar but less detailed grammar on the 
   textbook website.  You should use this grammar as a starting point, but
   several parts of it will require you to think very carefully and 
   experiment by trial and error.  You'll also have to add the MiniJava+
   extensions to the grammar (see casn2.txt for details).  For this 
   assignment, you just need to write a grammar without any semantic action,
   except print statements for tracing.  I'll start you off with the following
   front portion of my .cup file:

import java.util.*;
import java.io.*;
import java_cup.runtime.*;

parser code {: Yylex scanner; :};
init with {: 
   try
   {
      scanner = new Yylex(new FileReader(Global.sourcefile));
   }
   catch (Exception eee) {System.out.println(eee);}
:};

scan with {: return scanner.next_token(); :};

/* grammer symbols: */

terminal String 	ID;
terminal Integer	INTLIT;
terminal String		STRINGLIT;
terminal 		DOUBLELIT;
terminal 		PLUS, TIMES, MINUS, SLASH, LPAREN, RPAREN, TRUE, FALSE;
terminal		LBRACK, RBRACK, LBRACE, RBRACE, THIS, NEW, UMINUS;
terminal 		CLASS, PUBLIC, STATIC, VOID, MAIN, STRINGtok;
terminal		EXTENDS, RETURN, INTtok, BOOLEANtok, PRINTLN;
terminal		AND, OR, NOT, IF, PRINTF, EQUAL, DOT, SEMI, COMMA;
terminal	   	WHILE, LENGTH, ELSE, LESSTHAN, DOUBLEtok, EQEQ;
terminal		CLIB;

non terminal		PROGRAM;
non terminal		Statement;
non terminal		Expression;

 etc...  ...........

PROGRAM ::= ...
...


Process the grammar with

java -cp .:/shared/csc/local java_cup.Main -expect 4 < yourfile.cup

The "-expect 4" means you are expecting up to 4 parser conflicts.  For your
assignment:

    YOU ARE NOT ALLOWED ANY REDUCE-REDUCE CONFLICTS

    Your are allowed a maximum of 4 shift-reduce conflicts.  Cup will
    resolve these conflicts in favor of shifting, but YOU MUST
    PROVIDE JUSTIFICATION FOR WHY THIS IS THE CORRECT ACTION IN EACH CASE.
    One of these cases will most probably be between if and if-else because
    of the dangling-else problem.  In my grammar, this is the only conflict.
    If you got any others, you probably did it wrong and your parser won't 
    work properly.  See notes below for additional, important hints.

When finished, you can test it with the parsertest.java program on the
web page (study it to understand what it's doing).  Make sure you test
the parser on at least Factorial.java and BinarySearch.Java as
examples.  When you get an error message that says something like
"cannot parse past char 56", the number 56 is really the line number,
not the char number.  You'll probably need to insert finer tracing
print statements to track down the error further.


--------  Additional Hints on Creating the Grammar ---------

I am sharing with you some experiences I've had while constructing my
own parser.  

  1.  Unlike what the BNF grammar suggests, do NOT distinguish variable
      declarations (VarDecl) with other kinds of statements (Statement).
      In MiniJava, all variable declarations inside a method body must
      preceed other statements (like in old C), however, this rule
      can also be enforced at a later stage, when we examine the abstract
      syntax structure.  

      If you just enter the BNF rules without enough thought, you'll 
      get shift-reduce conflicts and the default resolution in favor of 
      shift will be WRONG!  This is because if a method body contains
      no variable declarations, then there should be an epsilon-reduction
      to VarDecl, which means you should reduce, not shift.  It is better
      to just allow a method body to contain an arbitrary number of statements.

  2.  The minijava grammar in the text also treats "return" statements in
      a special way.  That is, the return statement must always be the
      last statement in a method (except main).  However, I suggest that
      you make return statements just another kind of statement.  Again,
      if we want to enforce that return can only be called at the end,
      we can do it later, after we've built the syntax tree.

  3.  You cannot use notation like Statement* in Java Cup.  To do the same
      job, you need two or more non-terminal symbols:

      Statement  ->  if ...
                  |  while ... 
                  ...
                  | LBRACE Stats RBRACE;

      Stats      ->       (epsilon)
                  | Stats Statement;

      You can also write (Statement Stats) as the last production but I 
      recommend the left-recursive version.  This is because, in the next
      part of the assignment you will be associating an ArrayList structure
      with each Stats symbol.  When you add a new element to an ArrayList,
      it will be added to the end, not the front. If your grammar is
      not written in a correct way, you'll end up building the ArrayList in
      reverse order.  It is OK, as long as you use it consistently.

      Sometimes you'll need three symbols. For example, A list of
      parameters can be epsilon, a single expression, or a list of
      comma-separated expressions with no comma after the final expression.
      This can be parsed using the following grammar:

      ExpList -> epsilon
              |  Expression ExpRest

      ExpRest -> epsilon
              |  COMMA Expression ExpRest

      Think carefully and make sure you understand why there are two 
      epsilon productions (under what situations are they useful).  You 
      need to be able to think in such details in order to build the parser.