Higher Order Compilation Project Homepage
This page describes an ongoing project to use higher order
abstract syntax and higher order logic programming for compiler
implementation. The code and examples of an experimental compiler are
contained here. The meta-language is the Teyjus implementation of Lambda Prolog.
There is a corresponding
research paper. Some additional notes
are also available. These notes are informal and may change.
The purpose of this project is to demonstrate and study the use of
a higher order logic programming language in compiler construction.
Optimizing compilers take years to develop, and it's not my purpose to
perfect a product. I want to develop a set of techniques, that
combined with previous work by various researchers in higher order
specification, can be used in realistic compilation. For this purpose
I've chosen to compile a very typical, imperative style language.
I've also decided to forgo abstract machines in favor of a real
RISC architecture.
The source language being compiled is called "bobcat", because
it syntactically resembles the "tiger" language described in Andrew
Appel's compiler text. Bobcat however, has a rudimentary form of Java
style object orientation (without inheritance). The target language is Sparc assembly code.
Executables are created using cc (or gcc) in its role as an assembler.
The get a sense of the bobcat language being compiled, look at test3.bob and test7.bob.
The generated assembly codes are in test3.s and
test7.s
respectively. Type "gcc test3.s" or "gcc test7.s" on a Sparc machine
to generate an executable a.out, then type "./a.out" to run the
program. (Use "cc" if gcc is not available). Using the C linker has
the beneficial effect of allowing C functions (printf) to be called
from bobcat programs.
**
Some of the code presented here slightly varies from those presented
in the paper "Compiler Construction in Higher Order Logic Programming."
The compiler is meant to be experimental and will undergo refinements
and even major overhauls over time.
**
The structure of the modules that make up the compiler are as follows:
- module "bobabsyn": sig,
mod
Defines the higher order abstract syntax for the bobcat language. The
.mod file contains copy clauses for substitution. The file also has
first-order alternatives (unused) for certain constructs.
- module "lambdayacc": sig,
mod
A deterministic, bottom-up parser generator.
This module also contains a semi-customizable lexical scanner.
This sub-project has its own homepage, and
there's also an associated research paper.
- module "bobcatgram": sig,
mod
The specification of the bobcat grammar and semantic actions of the parser.
The sig file contains the declarations of the grammar symbols used. The
mod file contains the grammar, operator precedence and associativity
declarations, and semantic action clauses to generate lambda-syntax trees.
- module "bobparser": sig,
mod
The parser that's generated automatically from bobcatgram by lambdayacc.
bobparser produces a lambda syntax tree (type texp). To get a good idea
of what kind of structure is produced, parse one of the simpler bobcat
programs with:
[bobparser] ?- parsefile "test4.bob" A.
- module "kitty": sig, mod
Contains the definition of a continuation passing style intermediate
representation (type kexp) and the transformation from the lambda syntax
tree to CPS form.
To see a sample cps representation of a program, do
[kitty] ?- parsefile "test4.bob" (program A), formcps kret A B.
Variable B will be instantiated with the cps form of the program.
A function call f(a+b) is represented as
(cps (opexp "+" a b) (kabs u\ (karg u 0 \ (kcall f CN 1 K))))
where K is the overall continuation and CN is the class that f belongs.
to. CN="." indicates a global function. (in this way type information is
carried in my CPS representation; it's necessitated by object orientation).
- module "absparc": sig,
mod
Contains another, very low level intermediate language "Abstract Sparc,"
which is machine dependent, but represents Sparc machine instructions in
abstract syntax. For example, moving the contents of register "o1" to
register "l2" is represented by
movop (reg "o" 1) (reg "l" 2)
The mod file contains the code generation
(gencode) clauses that transform the cps language to the Abstract
Sparc language.
The most important type of this module is "store". A store can be
a register, a memory location ("indirect"), a constant, or a label.
The most important predicates are gencode and genloadop. The constructor
"assoces" associates expressions with stores containing them. genloadop
(generate-load-operand) "loads" an expression into a register,
emitting the necessary instructions to do so at the same time.
- module "realsparc": sig,
mod
Contains the straight-forward translation from the abstract Sparc language
to real Sparc assembly language instructions as strings, which are
then written to a file. This module also contains the toplevel predicate
"tigcompile", which is used as in:
[realsparc] ?- tigcompile "test3.bob" "test3.s".
- module "typecat": sig,
mod
Type checking module for the lambda-syntax tree. This module is currently
NOT hooked into the rest of the compiler (doesn't deal with oop yet).
In accumulates the bobparser module, and can be used independently as in
[typecat] ?- parsefile "test3.bob" (program A), typeexp A B.
List of sample bobcat programs and their compiled assembly codes:
- test3.bob, test3.s:
contains demonstration of basic imperative programming techniques,
including recursive functions and loops.
- test4.bob, test4.s:
demonstrates mutually recursive functions
- test5.bob, test5.s: demonstrates arrays (static arrays). Arrays need to be redone.
- test6.bob, test6.s: demonstrates static scoping with nested lets
- test7.bob, test7.s: demonstrates object orientation with two classes.
All relevant files are in bobcat.tar.gz
Please contact Chuck Liang at liang@cs.umn.edu for questions.