Compiler Final Prep At the scheduled time of the final exam, you will be asked to demonstrate your compiler on a variety of programs, some of which will only be made available when the session starts. You should try to compile the various .mj programs on the homepage. The code you generate will also be graded, and should not contain obvious redundancies such as movl %eax, %ebx movl %ebx, %eax or push %eax pop %ebx # replace with movl %eax,%ebx Only a compiler that has satisfied all requirements will receive the coveted, extremely rare ***Compiler Jedi*** certificate. /////////////////////////////////////////////////////////////////////////// Hints for code generation: Unless you're using the stack-based as opposed to register-based solutions, you need to understand how to use the am7c classes that I have defined. You can create a new codeblock with: codeblock cb = new codeblock("comment"); To add a new operation such as "move ebx, 8(ebp)": cb.add(new move(new register(1), new mem(am7c.bp,8))) A codeblock itself is also an operation and can be added to another codeblock. Each codeblock has an operand .result To compile an expression such as subexp(A,B) (representing A-B): Let A be the destination and B the source. First determine which subexpression A or B is larger and and compile that first. When compiling the destination, be sure that the code generated will store the result in a register. You may find the need to pass additional parameters to visit and you can do so as follows (for example): within compiler visitor class: boolean destination = false; // simulated extra parameter to visit codeblock daccept(construct c, boolean dest) { boolean saved = destination; destination = dest; codeblock cd = c.accept(this); destination = saved; return cd; }//daccept, passing extra parameter as destination. Here, destination tells the visitor whether to generate a source operand or a destination operand. For example, in compiling (2-3), only the source operand (3) may be an immediate, the "destination" 2 must be loaded into a register. Thus to compile an integer constant intexp(val), you would do something like: pbulic codeblock visit(intexp ie) { codeblock cb = new codeblock("int"); if (!destination) return cb.finish(new immediate(ie.val)); register r = register.findreg(); if (r==null) throw new Error("not enough registers - should never happen"); cb.add(new move(new immediate(ie.val), r)); cb.finish(r); // sets cb.result to register r return cb; } Now to compile a subexp(A,B), assuming that A is the larger expression, you would do something similar to: codeblock cb = new codeblock("subtraction"); codeblock cA = daccept(A,true); codeblock cB = daccept(B,false); cb.add(cA); cb.add(cB); cb.add(new aluop("-",cB.result,cA.result)); cB.freereg(); // CRUCIAL cb.finish(cA.result); return cb; Clearly you do not want to free any registers associated with with the destination operand, since that may be needed to compile a larger expression or statment. Here's how you might compile an assignment statement: public codeblock visit(assignstat ast) { // start with symbol table lookup varentry vinfo = (varentry)(currentst.lookup(ast.ve.vname)); int offset = vinfo.position; if (offset>=0) offset += argadjust; // skip ebp, ret. addr offset *= am7c.wordsize/8; mem dst = new mem(am7c.bp,offset); if (vinfo.instance) // var is an instance var of an object { offset = am7c.wordsize/8 * vinfo.position; dst = new mem(am7c.thisp,offset); } codeblock cd = new codeblock(ast,"="); codeblock value = daccept(ast.val,false); cd.merge(value); // or cd.add(value); cd.add(new move(value.result,dst)); value.freereg(); // CRITICAL return cd; } My abstract syntax structure assignstat contains a variable expression ve with string ve.vname representing the string for the variable. assignstat also has a val field which is the value being assigned to (some expression). The code proceeds by consulting the symbol table, which contain information that allows us to form the memory address that corresponds to the variable. Instance variables are offsets from esi (am7c.thisp), parameters are positive offsets from ebp (am7c.bp) and local vars are negative offsets from ebp. Then a codeblock is generated for the value. And value.result will be the operand (probably a register since we asked for one) that will be moved to the memory location. Note that when compiling the value to be assigned, value.result must be in a register. Otherwise x = y+1; will become x = ++y; My version of codeblock also stores a pointer to the relevant absyn struct. /////////// To compile a method, follow the C calling convention: new push(am7c.bp) // push %ebp new move(am7c.sp,am7c.bp) // movl %esp,%ebp ... now move(returnval.result, am7c.rtp) // mov returnval, %eax new move(am7c.bp,am7c.sp) // movl %ebp,%esp new pop(am7c.bp) // pop %ebp new freturn() // ret To compile a method call n.f(x,y), hopefully your typechecker would have stored the type of the expression n in its abstract syntax to that it's readily available (this is easy to add to your exsiting typechecker). This will allow you to form the label to call: classname_f // generate code to push all active registers on stack, especially am7c.thisp // (%esi). // allocate stack space for parameters, or use push: codeblock arg2 = daccept(y,false); new push(arg2.result); arg2.freereg(); codeblock arg1 = daccept(x,false); new push(arg1.result); arg2.freereg(); codeblock targetobj = daccept(n,false); // compile the object expression n targetobj.freereg(); // calling freereg whenever possible is crucial new move(targetobj.result,am7c.thisp); // move target address to %esi new call(new label("classname_f")); // deallocate stack space after return: new aluop("+",new immediate(8), am7c.sp); // because 2 args were pushed register r = register.findreg(); new move(am7c.rtp,r) // don't leave return value in eax // pop saved registers from stack, including previous %esi finish(r) // set result of method call to r ***NOTE: the order that the instructions are generated is crucial. If you overwrite the %esi (thisp) register too early, the arguments (arg1, arg2) might not be compiled correctly. *** Also, freereg() must be called at just the right time: too late and you might run out of registers when compiling the other expressions and too early it'll be overwritten by the time you use it. If you calculated what registers need to be saved correctly, then the final pops should not override r. If you're not confident, then do the move(am7c.rtp,r) after the pops. If the return type of the method is void, then no result need to be set. printf and malloc need to be treated as special cases. For each new class, create an implicit constructor that returns the address of the object constructed. You can choose to call malloc from the calling site or from within the constructor. Again you will need to consult the symbol table to determine how many instance variables are there in a class in order to pass the correct parameter to malloc. ----- Array index expressions and assignment to arrays: In my abstract syntax there is a class indexexp that represents A[i]. There's another class arrayassignstat, which extends assignstat for assigning to an array (A[i] = E;). Let's call A the 'base' and i the 'index'. Both can be compound expressions and should therefore be recursively compiled (into codeblocks). As the Sethi-Ullman algorithm suggests, you should again choose the larger of the two to compile first, but in my code I just always compile the index first, since that's more likely to be compound (A[i+k-1]). Ideally, the result operands of both the base and index expressions should be in registers, which will allow you to form a memory operand (base.result,index.result,4) where 4 is am7c.wordsize/8. This is the only place where you might have problems finding enough registers. In that case, require only the base (or only the index) to have a register operand, then add the value of the index*4 (or base) to the base. To compile A[i] as an expression, you can reuse the index or base register as the resulting operand of the entire operand. In other words, you would have something like move (baseregister,indexregister,4), baseregister. This means also that you would freereg() the index register but not the base. ============ More hints may be posted later ... ============= Now to put everything together, the front end of your compiler can look something like mine: // front end of minijava+ compiler import java.io.FileReader; public class mjc32 { public static void main(String[] args) throws Exception { int optlevel = 0; // optimization level as determined by args for (int i = 1;i