Informal Notes August 8th, 2001 Chuck Liang This directory contains all the files for an experimental compiler written in Teyjus Lambda Prolog (please use most recent version - b31). The source language being compiled is called "bobcat", because it syntactically resembles the "tiger" language described in Andrew Appel's compiler text. Bobcat however, has a rudimentary form of C++ style object orientation. The target language is Sparc assembly code. Executables are created using cc (or gcc) in its role as an assembler. The purpose of this project is to demonstrate and study the use of a higher order logic programming language in compiler construction. Optimizing compilers take years to develop, and it's not my purpose to perfect a product. I want to develop a set of techniques, that combined with previous work by various researchers in higher order specification, can be used in realistic compilation. For this purpose I've chosen to compile a very typical, imperative style language. I've also decided to forgo simple abstract machines in favor of a real RISC architecture, namely Sparc. The get a sense of the bobcat language being compiled, look at "test3.bob" and "test7.bob". The generated assembly codes are in "test3.s" and "test7.s" respectively. Type "gcc test3.s" or "gcc test7.s" on a Sparc machine to generate a normal executable a.out, then type "./a.out" to run the program. (Use "cc" if gcc is not available). Using the C linker has the beneficial effect of allowing C functions (printf) to be called from bobcat programs. :) ** Some of the code presented here slightly varies from those presented in the paper "Compiler Construction in Higher Order Logic Programming." The compiler is meant to be experimental and will undergo refinements and even major overhauls over time. ** The structure of the modules are as follows. "bobcatgram" accumulates "bobabsyn" and "lambdayacc". The other modules all accumulate the one listed before it: module "bobabsyn": Defines the higher order abstract syntax for the bobcat language. The .mod file contains copy clauses for substitution. The file also has first-order alternatives (unused) for certain constructs. It may seem strange that this module have abs constructors of type (string -> texp) -> texp. This was a somewhat arbitrary decision. I could have used (texp -> texp) -> texp, but the string form was just more convenient at the time, and it stuck. Teyjus allows the introduction of eigenvariables of type string without any problems, and I don't think it makes that much of a difference. After all, the NAMES are what we want to abstract over. module "lambdayacc" A deterministic, bottom-up parser generator. Grammar symbols have type "gs". This module also contains a semi-customizable lexical scanner. module "bobcatgram" The specification of the bobcat grammar and semantic actions of the parser. The sig file contains the declarations of the grammar symbols used. module "bobparser" The parser that's generated automatically from bobcatgram by lambdayacc. bobparser produces a lambda syntax tree (type texp). To get a good idea of what kind of structure is produced, parse one of the simpler bobcat programs with: [bobparser] ?- parsefile "test4.bob" A. There is no need to regenerate the parser unless changes are made to the "cfg" clause in "bobcatgram". Changes to semantic actions in "bobcatgram", as long as they retain the same types, will not affect the parser. module "kitty" Contains the definition of a continuation passing style intermediate representation (type kexp) and the transformation from the lambda syntax tree to CPS form. This module is probably the most interesting component of the present project. To see a sample cps representation of a program, do [kitty] ?- parsefile "test4.bob" (program A), formcps kret A B. Variable B will be instantiated with the cps form of the program. Be aware that the construct "kfix" doesn't exactly define functions, but is used to abstract over names for local functions as well as variable and class definitions. "kfunc" is the construct that defines functions. "kfix" should probably be called "klet". Although if I were to do it over I would have cleaned up much of the syntax, essentially this is correct, and relatively quite elegant (especially when compared to the next module). The style of CPS I use is not traditional in the sense that every function is given an extra argument, the continuation function. Instead, I represent every function call f(x) with (basically) a pair (f(x), K), where K is the continuation function that the return value of f will be passed to. In other words, (let v = f(x) in (K v)). There is literature that argues that this way of doing things is better. To be more precise, a function call f(a+b) is represented as (opexp "+" a b (kabs u\ (karg u 0 \ (kcall f CN 1 K)))) where K is the overall continuation and CN is the class that f belongs. to. CN="." indicates a global function. (in this way type information is carried in my CPS representation; it's necessitated by object orientation). module "absparc" Contains another, very low level intermediate language "Abstract Sparc," which is machine dependent, but represents Sparc machine instructions in abstract syntax. For example, moving the contents of register "o1" to register "l2" is represented by movop (reg "o" 1) (reg "l" 2) This gives me another level of flexibility to manipulate the final code output. I perform last-minute optimizations on this representation, such as elementating obviously redundant operations like in [movop A B, movop B A,...]. The mod file contains the code generation (gencode) clauses that transform the cps language to the Abstract Sparc language. These clauses were by far the trickest to write, largely due to the specific characteristics of the Sparc architecture. In some ways CPS can be thought of as a fancy form of reverse-Polish notation, and naturally implemented on a simple stack-based abstract machine. But I decided to skip that and generate machine-dependent code directly. This module makes heavy use of cut (!) to model state. This module can be cleaned up surely. The most important type of this module is "store". A store can be a register, a memory location ("indirect"), a constant, or a label. The most important predicates are gencode and genloadop. The constructor "assoces" associates expressions with stores containing them. genloadop (generate-load-operand) "loads" an expression into a register, emitting the necessary instructions to do so at the same time. module "realsparc" Contains the straight-forward translation from the abstract Sparc language to real Sparc assembly language instructions as strings, which are then written to a file. This module also contains the toplevel predicate "tigcompile", which is used as in: [realsparc] ?- tigcompile "test3.bob" "test3.s". module "typecat" (added 8/4/2001) Type checking module for the lambda-syntax tree. This module is currently NOT hooked into the rest of the compiler (doesn't deal with oop yet). In accumulates the bobparser module, and can be used independently as in [typecat] ?- parsefile "test3.bob" (program A), typeexp A B. --- List of sample bobcat program sucessfully compiled: test3.bob: contains demonstration of basic imperative programming techniques, including recursive functions and loops. test4.bob: demonstrates mutually recursive functions test5.bob: demonstrates arrays (static arrays). Arrays need to be redone. test6.bob: demonstrates static scoping with nested lets test7.bob: demonstrates object orientation with two classes. Known limitation of bobcat: Being experimental and under development, there are many restrictions in what can and can't be compiled: Type checking is not implemented, though some type information is used during compilation. There is thus little error reportage. Functions can have only a limited number of local variables, since stack frames of fixed size (200 bytes) are created. This can be solved of course, by counting the number of local variables of a function. I regarged this as mundane and thus didn't implement it (at least not yet). Be aware: I have not tested functions that take more than 5 parameters: I'm not sure if they'll work. The Sparc architecture has only a limited number of registers for parameter passing; more would require using the stack. A let statement can't just define a function without defining any variables first (stupid I know. Sorry). A function's parameters can't be changed. This can be called a "feature" since it also appears in other languages (Ada, Eiffel). Functions can be compiled more efficiently with this restriction. Be aware however, that the restriction is not enforced. no error is reported: it'll just crash! A class can only contain simple variables (no arrays, other objects). This restrictions was actually due to not enough type information used during compilation. An array of objects also can not be created yet. All variables in a class need to be defined inside the "private" section, and all functions need to be in the "public" section. The only way to access an object's internal state is through the functions. This restriction at least, is not necessarily a bad one.