Informal Notes
August 8th, 2001
Chuck Liang


This directory contains all the files for an experimental compiler
written in Teyjus Lambda Prolog (please use most recent version -
b31).  The source language being compiled is called "bobcat", because
it syntactically resembles the "tiger" language described in Andrew
Appel's compiler text.  Bobcat however, has a rudimentary form of C++
style object orientation.  The target language is Sparc assembly code.
Executables are created using cc (or gcc) in its role as an assembler.

  The purpose of this project is to demonstrate and study the use of 
a higher order logic programming language in compiler construction.
Optimizing compilers take years to develop, and it's not my purpose to
perfect a product.  I want to develop a set of techniques, that
combined with previous work by various researchers in higher order
specification, can be used in realistic compilation.  For this purpose
I've chosen to compile a very typical, imperative style language.
I've also decided to forgo simple abstract machines in favor of a real
RISC architecture, namely Sparc.

The get a sense of the bobcat language being compiled, look at
"test3.bob" and "test7.bob".  The generated assembly codes are in
"test3.s" and "test7.s" respectively.  Type "gcc test3.s" or "gcc
test7.s" on a Sparc machine to generate a normal executable a.out,
then type "./a.out" to run the program.  (Use "cc" if gcc is not
available).  Using the C linker has the beneficial effect of allowing
C functions (printf) to be called from bobcat programs. :)


** 
Some of the code presented here slightly varies from those presented
in the paper "Compiler Construction in Higher Order Logic Programming."
The compiler is meant to be experimental and will undergo refinements
and even major overhauls over time.
**


The structure of the modules are as follows.  "bobcatgram" accumulates
"bobabsyn" and "lambdayacc".  The other modules all accumulate the one
listed before it:


module "bobabsyn": 
  Defines the higher order abstract syntax for the bobcat language.  The 
  .mod file contains copy clauses for substitution.  The file also has 
  first-order alternatives (unused) for certain constructs.

  It may seem strange that this module have abs constructors of type 
  (string -> texp) -> texp.  This was a somewhat arbitrary decision.  
  I could have used (texp -> texp) -> texp, but the string form 
  was just more convenient at the time, and it stuck.  Teyjus allows
  the introduction of eigenvariables of type string without any problems,
  and I don't think it makes that much of a difference.  After all, the
  NAMES are what we want to abstract over.


module "lambdayacc"
  A deterministic, bottom-up parser generator.  Grammar symbols have type 
  "gs".  This module also contains a semi-customizable lexical scanner.


module "bobcatgram"  
  The specification of the bobcat grammar and semantic actions of the parser.
  The sig file contains the declarations of the grammar symbols used.


module "bobparser"
  The parser that's generated automatically from bobcatgram by lambdayacc.
  bobparser produces a lambda syntax tree (type texp).  To get a good idea 
  of what kind of structure is produced, parse one of the simpler bobcat 
  programs with:

     [bobparser] ?- parsefile "test4.bob" A.

  There is no need to regenerate the parser unless changes are made to the
  "cfg" clause in "bobcatgram".  Changes to semantic actions in "bobcatgram",
  as long as they retain the same types, will not affect the parser.


module "kitty"
  Contains the definition of a continuation passing style intermediate 
  representation (type kexp) and the transformation from the lambda syntax 
  tree to CPS form.   This module is probably the most interesting component
  of the present project.

  To see a sample cps representation of a program, do

  [kitty] ?- parsefile "test4.bob" (program A), formcps kret A B.

  Variable B will be instantiated with the cps form of the program.
  
  Be aware that the construct "kfix" doesn't exactly define functions,
  but is used to abstract over names for local functions as well as variable
  and class definitions.  "kfunc" is the construct that defines functions.
  "kfix" should probably be called "klet".

  Although if I were to do it over I would have cleaned up much of the 
  syntax, essentially this is correct, and relatively quite elegant 
  (especially when compared to the next module).  The style of CPS I use
  is not traditional in the sense that every function is given an extra
  argument, the continuation function.  Instead, I represent every function
  call f(x) with (basically) a pair (f(x), K), where K is the continuation 
  function that the return value of f will be passed to.  In other words, 
  (let v = f(x) in (K v)).  There is literature that argues that this way 
  of doing things is better.

  To be more precise, a function call f(a+b) is represented as
  (opexp "+" a b (kabs u\ (karg u 0 \ (kcall f CN 1 K))))
   where K is the overall continuation and CN is the class that f belongs.
   to. CN="." indicates a global function.  (in this way type information is
   carried in my CPS representation; it's necessitated by object orientation).


module "absparc"
  Contains another, very low level intermediate language "Abstract Sparc,"
  which is machine dependent, but represents Sparc machine instructions in
  abstract syntax.  For example, moving the contents of register "o1" to 
  register "l2" is represented by

  movop (reg "o" 1) (reg "l" 2)
  
  This gives me another level of flexibility to manipulate
  the final code output.  I perform last-minute optimizations on this
  representation, such as elementating obviously redundant operations like
  in [movop A B, movop B A,...].  The mod file contains the code generation
  (gencode) clauses that transform the cps language to the Abstract
  Sparc language.  These clauses were by far the trickest to write,
  largely due to the specific characteristics of the Sparc
  architecture.  In some ways CPS can be thought of as a fancy form of
  reverse-Polish notation, and naturally implemented on a simple 
  stack-based abstract machine.  But I decided to skip that and generate 
  machine-dependent code directly.

  This module makes heavy use of cut (!) to model state.  
  This module can be cleaned up surely.  

  The most important type of this module is "store".  A store can be
  a register, a memory location ("indirect"), a constant, or a label.
  The most important predicates are gencode and genloadop.  The constructor
  "assoces" associates expressions with stores containing them.  genloadop
  (generate-load-operand) "loads" an expression into a register, 
  emitting the necessary instructions to do so at the same time.  


module "realsparc"
  Contains the straight-forward translation from the abstract Sparc language
  to real Sparc assembly language instructions as strings, which are 
  then written to a file.  This module also contains the toplevel predicate
  "tigcompile", which is used as in:

  [realsparc] ?- tigcompile "test3.bob" "test3.s".


module "typecat"  (added 8/4/2001)
  Type checking module for the lambda-syntax tree.  This module is currently
  NOT hooked into the rest of the compiler (doesn't deal with oop yet).  
  In accumulates the bobparser module, and can be used independently as in

  [typecat] ?- parsefile "test3.bob" (program A), typeexp A B.


---

List of sample bobcat program sucessfully compiled:

test3.bob: contains demonstration of basic imperative programming techniques,
  including recursive functions and loops.
test4.bob: demonstrates mutually recursive functions
test5.bob: demonstrates arrays (static arrays).  Arrays need to be redone.
test6.bob: demonstrates static scoping with nested lets
test7.bob: demonstrates object orientation with two classes.


Known limitation of bobcat:

Being experimental and under development, there are many restrictions in
what can and can't be compiled:

Type checking is not implemented, though some type information is used 
during compilation.  There is thus little error reportage.

Functions can have only a limited number of local variables, since stack
frames of fixed size (200 bytes) are created.  This can be solved of course,
by counting the number of local variables of a function.  I regarged this
as mundane and thus didn't implement it (at least not yet).

Be aware: I have not tested functions that take more than 5 parameters: I'm
not sure if they'll work.  The Sparc architecture has only a limited number
of registers for parameter passing; more would require using the stack.

A let statement can't just define a function without defining any variables
first (stupid I know. Sorry).

A function's parameters can't be changed.  This can be called a "feature" 
since it also appears in other languages (Ada, Eiffel).  Functions can be
compiled more efficiently with this restriction.  Be aware however, that the
restriction is not enforced.  no error is reported: it'll just crash!

A class can only contain simple variables (no arrays, other objects).
This restrictions was actually due to not enough type information used 
during compilation.  An array of objects also can not be created yet.

All variables in a class need to be defined inside the "private" section,
and all functions need to be in the "public" section.  The only way to
access an object's internal state is through the functions.  
This restriction at least, is not necessarily a bad one.