Parser Generator Stage 2: Now the hard part begins By now you should have written a meta parser that parses grammars that are consitent with my syntax. You should have also computed the nullible, first and follows sets of the grammar. I'm also assuming that you have read chapter 3 of the text. Now it's time to generate the state machine. Let's choose our data structures first. These are my recommendations: Each state consists of a set of items: class Gitem implements Comparable { public short ri; // rule index into metaparser.Rules public short pi; // position of dot //public String la; // lookahead for LR(1) items, not needed for SLR public Gitem(short a, short b) {ri=a; pi = b;} public Gitem(int a, int b) {ri=(short) a; pi = (short)b;} public int compareTo(Gitem I) { return (ri*65536/2 + pi) - (I.ri*65536/2 + I.pi); } public boolean equals(Object b) // equals should be consistent with compare { return compareTo((Gitem)b) == 0; } public String toString() { return ri+":"+pi; } // may want to write prettyprint function for debugging public boolean processed =false; // used by various procedures. }//Gitem Each item consists of two 16 bit integers. In my program, ri indexes an ArrayList that stores the rules, and pi is the position of the dot on the right-hand side of that rule. pi==0 means that it's at the left end, and pi==rule.rhs.size() means that it's at the right end. If you choose to implement the full LR(1) option, then each item will also have a single lookahead. By adding the EOF symbol to the start production, we are always guaranteed to have a lookahead. Rewrite compareTo accordingly. Like with Gsym, I made the items sortable so you can insert them into a fast data structure (TreeSet). A state can then be a TreeSet. Study the TreeSet class to understand how to use it. For example: Given TreeSet state, you can do for(Gitem item:state) ... // iterate through all items in sorted order To use a while loop with better control: Iterator Itemiter = state.iterator(); while(Itemiter.hasNext() && ...) { ... do something with Itemiter.next() ... } ********* BE CAREFUL: Do not CHANGE state while using an Iterator or a foreach loop that implicitly uses an iterator: you will get a runtime error that concerns concurrency violations. For example, if you need to add new items to the state while running such a loop, DO NOT CALL state.add(..) Instead, create a new TreeSet additions, then after the loop ends, do state.addAll(additions); ********* Since the states are added one by one, I maintain a ArrayList> States; Whenever you generate a new state as defined by the SLR or LR(1) algorithm, you need to check if the state already exists in your list of States (use TreeSet.equals method). However, checking it against every state in States is O(n*n) - very slow. To speed up this step (especially for the full LR(1) option), I use an additional Hashtable> StateLookup; Each state is hashed by the number of items that it contains: the hash key is easily computed with state.size(). For example, if StateLookup.get(4) returns a TreeSet that contains the integers 3, 7, 9, it means that the states at indices 3, 7 and 9 in the ArrayList States are the ones that contain 4 items. Using this hashtable therefore speeds up determining whether a state has already been generated. This optimization may not be necessary if you choose the SLR option, which requires fewer states. //////////////////////////////////////////////////////// Now as you generate the states, you will also build the finite state machine that tells us whether to shift, reduce, accept, as well as which state to go to after a reduce. There are therefore 4 types of "state actions". Here is an opportunity to use OOP techniques to good effect, and I recommend the following setup (in outline): public interface stateaction { void doit(absparser apar); // may need some other things too. } Here, absparser is my abstract class that contains code common to all parsers. Then I define: class reduce implements stateaction { int rulei; // rule index (for an arraylist called Rules) to reduce to ... } class shift implements stateaction { int nextstate; // the state index (for an arraylist of states) to go to. ... } class gotostate implements stateaction {... } class accept implements stateaction {... } The abstract "doit" method should actually perform the action necessary, in terms of looking up the state machine, performing a reduce action, etc. My absparser method just calls doit and let dynamic dispatch do its work. AT FIRST, YOU CAN LEAVE THE IMPLEMENTATIONS OF doit EMPTY ({}), and fill them in later when you write code to use the state machine to parse. Alternatively, you can just use a string to represent the stateaction, so "s4" means shift and jump to state 4, etc. But then your code is less object oriented and all the code will need to be moved to the abstract class. But if that's what makes you comfortable, fine. The data structure for my finite state machine is ArrayList> FSM; FSM = new ArrayList>(8000); // set large initial capacity Each index i of this ArrayList cooresponds to state i in the ArrayList States. Each Hashtable maps a terminal or non-terminal symbol to a stateaction. ******** BE CAREFUL: if you use a Hashtable instead, you should override the hash code function that Gsym automatically inherits from Object. Also, when creating an ArrayList or Hashtable, be sure to set the initial capacity to be large enough to at least likely hold what you'll need. The default initial capacity is only ten! Incrementing the capacity dynamically results in inefficiency. ******** //////////////////////////////////////////////////////////// IMPORTANT: //////////////////////////////////////////////////////////// As you generate each new state, you need to check and report shift-reduce and reduce-reduce conflicts. This is an absolutely essential feature of an LR parser generator. To make things simple, don't worry about operator precedence and associativity at first. Whenever you create a new stateaction object and insert it into your FSM for some state, you need to check if some entry is already present (Hashtable.get will return null if no entry is present). That sounds easy, but here's what's tricky: reduce actions are generated as you form the items closure for a state. However, shift actions are formed when you generate a new state from an existing one. The new state may be genuinely new, or already exists in your list of states. You should not try to squeez everything into one method: it'll get ugly. My own code is organized around the following methods: stateclosure: forms the items closure for a state setactions: checks a state for reduce-reduce conflicts while setting reduce (and accept) actions. addstate: adds a new state formed by stateclosure. This method also takes an integer indexing the state that spawned it (the previous state) because we then need to set the shift and gotonext actions of the previous state, while checking for shift-reduce conflicts. gentable: overall method that constructs the FSM. Generating the states is similar to generating a spanning tree. Each state can spawn several new states, but after it's done spawning, it can be considered "closed". The FSM is complete when all states are closed. You should just have to maintain closed as a counter and use a while (closed state) { // keeps track of indices of items already processed: ArrayList done = new ArrayList(); int i = 0; for(i=0;i rs = rulesof.get(Ntind.get(rule.rhs.get(pi).sym)); //rs contains indices of all rules that has this nonterminal on the lhs for(Integer j:rs) { Gitem newitem = new Gitem(j,0); int pgi = SortedInsert(newitem,state); // SortedInsert inserts using compareTo order if (pgi>=0) done.add(pgi,false); }// for each rule beginning with nonterminal }//if }// !done.get(i) }// for each item i in state }// while }//stateclosure The overall procedure to generate the entire FSM is structurally similar to this procedure, except instead of items, you'll be processing states (you'll have a while (closed