MiniJava+: Lexer and Parser 1. You need to construct a lexical analyzer for MiniJava+ if you do not have the ability to generate one using your parser generator. Use jbl.flex as a guideline. For those of you who will be using Java_Cup, I have some additional examples to help you. If you're using some other parser generator or lexer, you're on your own. 2. On page 485 of the text book, you'll find the "BNF" version of the MiniJava grammar. You may use this grammar as a starting point, but several parts of it will require you to think very carefully and experiment by trial and error. You'll also have to add the MiniJava+ extensions to the grammar (see mjp1.txt for details). For this assignment, you just need to write a grammar without any semantic action, except print statements for tracing. I'll start you off with the following front portion of my .grammar file: nonterminal MethodDec Object nonterminal FormalLst Object nonterminal FormalRst Object nonterminal Type Object nonterminal Stat Object nonterminal Stats Object nonterminal Exp Object nonterminal ExpLst Object nonterminal ExpRst Object topsym Program left + 500 left - 500 left * 700 left && 400 left || 350 left ! 450 left == 300 left = 800 left < 300 left > 300 left ( 900 left ) 50 left LBR 890 left RBR 60 left [ 880 left ] 70 left DOT 810 left ; 805 flexname LBR { flexname RBR } flexname DOT . // ... You need to write the complete grammar. Although shift-reduce conflicts can be resolved by operator precedence, **YOU ARE NOT ALLOWED ANY REDUCE-REDUCE CONFLICTS** When finished, you can test it with the sample minijava+ program that I gave you. -------- Additional Hints on Creating the Grammar --------- I am sharing with you some experiences I've had while constructing my own parser. 1. Unlike what the BNF grammar suggests, do NOT distinguish variable declarations (VarDecl) with other kinds of statements (Statement). In MiniJava, all variable declarations inside a method body must preceed other statements (like in old C), however, this rule can also be enforced at a later stage, when we examine the abstract syntax structure. If you just enter the BNF rules without enough thought, you'll get shift-reduce conflicts and the default resolution in favor of shift will be WRONG! This is because if a method body contains no variable declarations, then there should be an epsilon-reduction to VarDecl, which means you should reduce, not shift. It is better to just allow a method body to contain an arbitrary number of statements. 2. The minijava grammar in the text also treats "return" statements in a special way. That is, the return statement must always be the last statement in a method (except main). However, I suggest that you make return statements just another kind of statement. Again, if we want to enforce that return can only be called at the end, we can do it later, after we've built the syntax tree. 3. You cannot use notation like Statement* in Java Cup. To do the same job, you need two or more non-terminal symbols: Statement --> if ... | while ... ... | LBRACE Stats RBRACE; Stats --> (epsilon) | Stats Statement; You can also write (Statement Stats) as the last production but I recommend the left-recursive version. In general, LR parsing prefers left to right-recursion. Also, in the next part of the assignment you will be associating an ArrayList structure with each Stats symbol. When you add a new element to an ArrayList, it will be added to the end, not the front. If your grammar is not written in a correct way, you'll end up building the ArrayList in reverse order. It is OK, as long as you use it consistently. Sometimes you'll need three symbols. For example, A list of parameters can be epsilon, a single expression, or a list of comma-separated expressions with no comma after the final expression. This can be parsed using the following grammar: ExpList --> epsilon | Expression ExpRest ExpRest --> epsilon | COMMA Expression ExpRest Think carefully and make sure you understand why there are two epsilon productions (under what situations are they useful). You need to be able to think in such details in order to build the parser. -------------------------------- Additional Hints: The "grammar" on page 485 of the text is quite limited. In particular, there is a rule Statement --> id [ Exp ] = Exp ; This rule indicates that minijava only allows a variable to refer to an array in an array assignment statement. although it allows for arbitrary expressions to represent the array in an expression. This restriction prevents us from having 2D arrays, for example. But a more pressing problem is that we need to allow for stand-alone function calls (that discards the return value), which means that we'll need a rule such as Statement --> Exp DOT id ( Explist ) ; These rules will lead to a shift-reduce conflict. To resolve, use Statement --> Exp [ Exp ] = Exp ; You can still check that the leading Exp is a variable (left-value) in the abstract syntax.