\documentstyle[11pt,proof]{article} \bibstyle{plain} \parindent=0.0cm \parskip=0.4cm \setlength{\textwidth}{15cm} \setlength{\oddsidemargin}{+0.4cm} \setlength{\evensidemargin}{0.4cm} \setlength{\topmargin}{+0.0cm} \setlength{\headheight}{0cm} \setlength{\textheight}{22cm} \setlength{\headsep}{1cm} \renewcommand{\baselinestretch}{1.0} \def\nvec#1,#2{#1_{1}, \ldots, #1_{#2}} \pagestyle{plain} %\marginparwidth 0pt \oddsidemargin 0pt %\evensidemargin 0pt \marginparsep 0pt %\topmargin 0pt %\textwidth 6.5in \textheight 8.5in %\setlength{\parskip}{6pt} %\pagestyle{plain} %\newcommand\Pscr {{\cal P}} \newcommand\rrr {\rightarrow} \newcommand\lrr {\longrightarrow} \newcommand\lrrs {\stackrel{\ast}{\longrightarrow}} \newcommand\lrrb {\stackrel{\ast}{\longleftrightarrow}} \begin{document} \begin{center} {\large\bf Class Notes on the Untyped Lambda Calculus}\\[5pt] {\normalsize\bf Chuck Liang}\\[5pt] {\normalsize\bf Hofstra University Computer Science} \end{center} \normalsize \bigskip \section{Background and Introduction} The Lambda Calculus is an abstract, mathematical basis for studying properties of programming languages just as integral calculus is used to study physics and other sciences. It was invented by Alonzo Church, who along with Alan Turing is considered one of the founders of computer science. %Although Church %originally created the $\lambda$-calculus to study the foundations of mathematics, %its impact on computer science has been much greater. We will study the basics of the lambda calculus for two reasons. First, most modern programming languages have direct support for ``lambda expressions'', and they are increasingly being used in advanced programming. Secondly, and {\em far more importantly,\/} the lambda calculus is a theoretical foundation for most of computer science: all computer algorithms can be explained by the lambda calculus (this is part of what is called the {\em ``Church-Turing Thesis.''\/}) It explains the behaviors of most programs and languages even when they don't use lambda expressions explicitly. There are two versions of the lambda calculus: typed and untyped. They roughly correspond to statically typed (compiler-based) and dynamically typed (interpreter-based) languages. This document will focus on the untyped lambda calculus. \medskip The syntax of the lambda calculus will look strange at first. Essentially it is trying to answer the question: {\em how do we apply a program to input?\/} We can study this problem in the form of {\em how to apply a function to an argument.\/} In many languages we write this as $f(t)$, where f is the function and t is the argument. In lambda calculus we write this as $(f~ t)$. Essentially, applying a program (function) to an input (argument) requires ``substitution,'' which is to just {\em replace each occurrence of the formal argument with the actual argument.\/} The lambda calculus is the smallest and the purest programming language. It is purely mathematical in that it is not concerned about how computation is to be implemented physically, only how symbols are to be transformed. Each language has a syntax for specifying functions. In lambda calculus functions are represented using the $\lambda$ binder. So \verb+'def f(x): return x*x'+ in Python, for example, will be $f = \lambda x.(*~x~x)$. The $\lambda x.$ ``binder'' defines the (formal) argument to the function. The $\lambda$-term itself is a value, sometimes called a {\em nameless function,\/} and we are just using the symbol $f$ to refer to it. When we apply the function to an (actual) argument, like in $(f~4)$, we replace each $x$ in the body of the function with $4$, which results in $4*4$, or $(*~4~4)$: this expression must be evaluated further using the built-in function for $*$. The evaluation continues until we reach a {\em ``normal form,''\/} such as $16$, which can't be evaluated any further. Sounds simple enough, but there are some complications. Actually, in Python you can also just write \verb+'f=lambda x:x*x'+, but even in languages that do not support lambda syntax directly, the lambda calculus is still significant. Consider the following C programs: \begin{verbatim} int f(int x) { int y = 1; { { // starts inner scope int x = 2; y=y+x; } } return y+x; // which x is this referring to? } \end{verbatim} What should happen if we applied $f(4)$? We can't just replace all $x$'s with $4$: we need to recognize that there are two $x$'s in the body, and only the outer one corresponds to the formal argument. $f(4)$ should return 7, but only if we respect the {\em scope\/} of variables. Consider another example: \begin{verbatim} int y = 1; // variable declared external to f (such as a global) int f(int x) { int y = 8; return x+y; } \end{verbatim} What should happen if we applied $f(y+1)$? You should know that it will return 10, but only because you understand that there are two $y$'s and which $y$ is being referred to in $x+y$. If we just blindly replaced $x$ with $y+1$, the function would return 17. Finally, consider the complete C program: \begin{verbatim} #include int x = 1; int f(int y) { return x+y; } int main() { int x = 2; printf("%d\n", f(0) ); //What will this print? What SHOULD it print? return 0; } \end{verbatim} Which $x$ is the $x$ in the body of $f$ referring to? Why? To handle these situations correctly, we need to enhance our strategy to replace formal arguments with an actual arguments. First, we need to distinguish between local variables (called ``bound variables'' in the lambda calculus) and non-local variables (called ``free variables''). \section{Formal Definitions} An expression (term) in the lambda calculus is defined inductively as follows: \begin{enumerate} \item a variable, $x$, $y$, $z$ etc. \item $(A ~B)$, where $A$ and $B$ are both lambda terms. This is called an ``application''. \item $\lambda x.A$, where $x$ is a variable and $A$ is a lambda term. This is called an ``abstraction.'' \item A known constant such as $3$. Constants are not strictly necessary but we will include them in order to show more interesting examples. Constants can also be functions: for example, $*$ is used to represent the multiplication function in most languages. \end{enumerate} For example, $\lambda x.(y~x)$ is a lambda term because $x$ and $y$ are variables, so $(y~x)$ is a term and therefore $\lambda x.(y~x)$ is also a term. According to this definition, $(3~3)$ is a valid lambda term, and is read as ``three applied to itself.'' That's fine. The lambda calculus is a purely symbolic, that is to say purely syntactic, system. Do not try to attach any {\em meaning\/} to the symbols and rules just yet. {\em Syntactic Conventions:} Applications in lambda terms associates to the left, and applications bind tighter than abstractions. This means that $(a~b~c)$ is equivalent to $((a~b)~c)$ and {\bf not} to $(a~(b~c))$. The parentheses are dropped when not needed. The term $\lambda x.x ~y$ should be understood to be equivalent to $\lambda x.(x~y)$ and {\em not\/} to $(\lambda x.x)~y$, because application binds tighter than abstraction: in other words, the scope of the $\lambda x$ binder extends to the right until it's delimited by enclosing $()$s. Thus in \ $(\lambda x.a)~(b~c)$ \ both pairs of parentheses are needed, but in \ $\lambda x.((a~ b)~c)$ \ both are superfluous. Here's another way to understand the syntactic conventions. Write applications $A~B$ as $app(A,B)$ and write abstractions $\lambda x.A$ as $abs(x,A)$. Then {\em application associates to the left\/} means that $A~B~C$ represents $app(app(A,B),C)$ and not $app(A,app(B,C))$. {\em Application binds tighter than abstraction\/} means that $\lambda x.A~B$ is $abs(x,app(A,B))$ and not $app(abs(x,A),B)$. Given an abstraction $\lambda x.M$ (where $M$ is an arbitrary term), $M$ is called the ``body'' (or {\em scope\/}) of the $\lambda$-abstraction and $\lambda x$ is the ``binder.'' We can think of such an abstraction as a function that takes x then evaluates and returns M. \subsection*{Free and Bound Variables} Inside a given term, variables can be local or non-local. Local variables are those that occur inside the scope of a $\lambda$-binder, and free variables are those that do not. For example, in $\lambda x.(y~x~x)$, $x$ is bound and and $y$ is free. Whether a variable is free or bound depends on the exact term in question: in the entire term $\lambda x.(x~y)$, $x$ is bound, but in the body $(x~y)$, $x$ is free. The variable $x$ that forms a part of the binder $\lambda x$ can also be consider bound, but it's not part of the body of the term (it declares that a variable is bound in the body). Consider the more interesting example: $$ \lambda x. y~ (\lambda x.x)~ x $$ Observe that by the syntactic conventions, this term should be read as $\lambda x. ((y~ (\lambda x.x))~ x) $. In the entire term, all occurrences of $x$ are bound and $y$ is free. Inside the body of the term $y~ (\lambda x.x)~ x $, however, the right-most occurrence of $x$ is free but the one inside the parentheses is bound. The right-most $x$ refers to the outer $\lambda x$ binder. \subsection*{Substitution and Beta-Reduction} Now we can more formally define how to apply a function in the form of a $\lambda$-abstraction to an argument. First, we define {\em substitution:\/}. Given a term $M$, let $M[N/x]$ represents the term obtained by replacing {\em all free occurrences of $x$ in $M$ with $N$\/}. Furthermore, we require the following condition: {\em the free variables of $N$ cannot appear bound in $M$.\/} For example, the substitution $((\lambda x.x)~ x) [N/x] $ yields the term $(\lambda x.x)~ N$: only the {\em free\/} occurrence of $x$ in the term is replaced with $N$. The bound instance of $x$ in the term represents a local variable, and should not be replaced: in other words, $\lambda x.x$ is self-contained (the proverbial ``black box''). Now we define the most critical rule of Lambda Calculus, called {\bf ``beta reduction''}: $$ (\lambda x.M)~N ~~\Rightarrow_\beta~~ M[N/x]$$ That is, to apply a $\lambda$-abstraction (function) to an argument, substitute $N$ for $x$ in M under the following two conditions: \begin{enumerate} \item Only the free occurrences of $x$ in $M$ are to be replaced ($x$ is bound in the entire term $(\lambda x.M)$ but it may be free in the body $M$). \item The free variables contained in $N$ may not be bound in $M$. \end{enumerate} A term in the form $(\lambda x.M)~N$ is called a {\em beta-redex:\/} a place where beta-reduction can be applied. Consider some examples: \begin{itemize} \item $(\lambda x.x+x)~ y$ beta reduces ($\Rightarrow$) to $y+y$. Here, $M$ is $x+x$ and $N$ is $y$ and we replace both free occurrence of $x$ in $M$ with $y$. \item $(\lambda x.y)~z ~~\Rightarrow~~ y$ . Here, $M$ is $y$ and $N$ is $z$. There are no free $x$'s in the body, so there's nothing to replace: $y[z/x]$ is still $y$. Such lambda terms correspond to functions that do not use their arguments. The $\lambda x$ binder here is {\em vacuous.\/} \item $(\lambda x\lambda y.y~ x)~ u ~(\lambda v.v) ~~\Rightarrow ~~ u$. It is tempting to think of a term that begins with $\lambda x\lambda y\ldots$ as a ``function that takes two arguments,'' but it is actually a function that takes one argument and returns another function, which takes the other argument. Such functions are called ``Curried'' functions (referring to the logician Haskell Curry). In this example, we first replaced $x$ with $u$, resulting in the intermediate term $(\lambda y.y~u)~(\lambda v.v)$. For this beta-redex, the argument ($N$) is itself a lambda-term, and we replace $y$ with $\lambda v.v$. This results in the term $(\lambda v.v)~u$. One final reduction reduces this to $u$. In other words, the reduction steps are: $$(\lambda x\lambda y.y~ x)~ u~ (\lambda v.v) ~\Rightarrow ~ (\lambda y.y~u)~(\lambda v.v) ~\Rightarrow ~ (\lambda v.v)~u \Rightarrow u$$ Note: although the parentheses around $\lambda v.v$ are not required in $(\lambda y.y~u)~(\lambda v.v)$, it's better to place them there because after substitution its scope should be clear. \item The next term corresponds to the second C program given in the introduction: $$(\lambda y\lambda x.(\lambda y.x+y)~8)~4$$ Please read the parentheses carefully: there's an inner beta-redex, which is $(\lambda y.x+y)~8$ where $M$ is $x+y$ and $N$ is 8, and an outer beta-redex, where $M$ is $\lambda x.(\lambda y.x+y)~8$ and $N$ is $4$. It does not matter which redex we reduce first (this is called the {\em confluence property.\/}). Let us reduce the outer redex first: this means substitute all free occurrences of $y$ in the body with $4$, but there are actually no free occurrences of the outer $y$ in the body: the $y$ inside the body is bound locally. The reduction sequence is therefore: $$(\lambda y\lambda x.(\lambda y.x+y)~8)~4 ~\Rightarrow~ \lambda x.(\lambda y.x+y)~8 ~\Rightarrow~ \lambda x.x+8 $$ In the last step, we reduced the beta-redex inside the outer $\lambda x$, which is $(\lambda y.x+y)~8$. \end{itemize} In each case above we reduced the term until there are no beta-redexes anywhere in the term, resulting in a {\em beta normal form.\/} {\em Are all terms reducible to normal forms?\/} {\bf NO.} The standard example is $$ (\lambda x.x~x) ~(\lambda x.x~x) ~\Rightarrow~ (\lambda x.x~x) ~(\lambda x.x~x) $$ %$$ %(\lambda x.x~x) ~(\lambda x.x~x) %$$ Yes, this is {\em a term that applies its argument to itself and is itself applied to itself, and such a term reduces to itself.\/} Got it? Don't even try to understand it intuitively: just apply the rules of beta reduction. Both free occurrences of $x$ in the body $x~x$ are replaced by the argument, which is $(\lambda x.x~x)$ (which does not contain any free variables so the restrictions on substitution do not apply). The term reduces back to itself and therefore will never be in normal form. The existence of such terms is important if we are to have repetition (loops, recursion) in our programs. %The above term is also used in the proof of the undecidability %of the Halting Problem (in later section). {\flushleft\bf Bad Examples of Beta Reduction:} \begin{itemize} \item $(\lambda y.\lambda x.x~y)~ v ~\Rightarrow~ (\lambda x.x)~ v ~\Rightarrow~ v$. This is {\bf wrong} because we misused the syntactic conventions of the lambda calculus. The first reduction results in $\lambda x.x~v$, which is not the same as $(\lambda x.x)~v$ because application binds tighter than abstraction: it is equivalent to $\lambda x.(x~v)$, which is already in normal form. \item $ (\lambda x.y~ x)~ (\lambda u.u) ~z ~\Rightarrow (\lambda x.y~ x)~z ~\Rightarrow y~z$. This is wrong because application should associate to the left: the correct reduction sequence is $ (\lambda x.y~ x)~ (\lambda u.u) ~z ~\Rightarrow y~ (\lambda u.u)~z$, which is already in normal form. \item $(\lambda y.\lambda y.y~y)~ v ~\Rightarrow~ (\lambda y.v~v)$. We didn't make the same mistakes as above but we made a worse one. We can only replace free occurrences of $y$ inside the body of the outer $\lambda$, which is $\lambda y.y~y$. There are no {\em free\/} occurrences of $y$ in this term! The correct normal form is $\lambda y.y~y$. \item $(\lambda y.\lambda x.x~y)~ x ~\Rightarrow~ (\lambda x.x~x)$. In the lambda calculus, this is as bad a mistake as you could possibly make. You violated the second restriction to beta reduction: {\em the free variables of $N$ cannot appear bound in $M$.\/} $N$ in this case is the free variable $x$, but inside the body of the $\lambda$-term, $x$ is bound (a local variable). From a programmer's perspective, this restriction should be intuitive: the local (bound) variable $x$ should not be confused with externally defined (free) variables by the same name. The wrong reduction does what's called ``free variable capture.'' Free variables must stay free after reduction. \end{itemize} So how do we reduce terms like in the last example? We just {\em rename\/} the bound variable to separate it from the free variable. \subsection*{Alpha Conversion} Surely \verb+int f(int y) {return y;}+ and \verb+int f(int x) {return x;}+ \ are the same program. Not only do they behave the same, but they are syntactically the same: the only difference is in the choice of the local (bound) variable used. In lambda calculus, terms such as $\lambda x.x$ and $\lambda y.y$ are called {\em alpha-equivalent.\/} Each term can be {\em alpha-converted\/} to the other by replacing the names of bound (local) variables consistently. Formally, a variable $y$ is called {\em fresh\/} with respect to a term $M$ if $y$ does not appear anywhere in $M$: it's a new variable\footnote{technically, it can appear in some places but to simplify things, let's assume that it's a completely new variable not found anywhere in $M$.}. Alpha conversion is defined by: $$\lambda x.M ~~\Rightarrow_\alpha~~ \lambda y.M[y/x] $$ where $y$ is {\em fresh\/} with respect to $M$. It is important to remember that alpha-equivalence is a purely syntactic equivalence and does not assume anything about the {\em meaning\/} of symbols. For example, $\lambda x\lambda y.x+y$ and $\lambda y.\lambda x.y+x$ are alpha-equivalent (one can be converted to the other by first replacing one of the variables with an intermediate fresh variable $z$). However, $\lambda x\lambda y.x+y$ and $\lambda x.\lambda y.y+x$ are {\bf not} alpha-equivalent. First, you cannot assume that the symbol '$+$' represents numerical addition, which is commutative. In many languages $+$ is used for string concatenation which is not commutative ("ab" $\not=$ "ba"). Secondly, even if $+$ is known to be commutative the two terms are still not alpha-equivalent because they are structurally different. Alpha-equivalence is a purely syntactic property. We need alpha-conversion to allow beta reduction to proceed correctly when the names of free and bound variables clash. In the example $(\lambda y.\lambda x.x~y)~ x$, we can apply alpha conversion to the lambda term and rewrite it as $(\lambda y.\lambda z.z~y)~ x$, which then reduces to $\lambda z.z~x$: the free variable $x$ is not illegally captured by the local binder. This example roughly corresponds to the situation found in the first C program of the introduction. We can only change the names of {\bf bound\/} variables, not free variables. It would be illegal to change the outer (rightmost) $x$ to something else. Free variables correspond to external variables which may be in use elsewhere, not just {\em locally.\/} \section{Basic Combinators} A lambda term without any free variables (completely self-contained) is called a {\em combinator.} There are three famous ones: \begin{enumerate} \item $I ~=~ \lambda x.x$ \item $K ~=~ \lambda x\lambda y.x$ \item $S ~=~ \lambda x\lambda y\lambda z.x~z~(y~ z)$ \end{enumerate} It is know that all other combinators can be generated by combining these terms in some fashion. Let's look at how they behave. First, it should be clear, if you understood the above, that $((\lambda x.x)~ A)$ beta-reduces to $A$, regardless of the what kind of term $A$ is. So we can write down an {\bf axiom:} $IA = A$. Now consider $K$: this terms takes two arguments (actually one after the other), and ignores the second argument. Given any two terms $A$ and $B$ where $y$ does not appear free in $A$ (alpha-convert $K$ otherwise), $(K~A~B)$ reduces to $A$: $$(K~A~B) = (\lambda x\lambda y.x)~A~B ~\Rightarrow~ (\lambda y.A)~B ~\Rightarrow~ A$$ In the last step, because $y$ is not free in $A$, $A$ is not affected by the substitution $A[B/y]$. Since K can always be alpha converted away from the free variables of $A$, we can write down another {\bf axiom:} $KAB = A$. Now consider $(K~ I~ A~ B)$ where $I$ is (alpha-equivalent to) $\lambda u.u$ and $A$ and $B$ are arbitrary. Recall that by syntactic convention, this is the term $((K~ I)~A)~B$. First we reduce the innermost redex $(K~I)$: $$KI = (\lambda x\lambda y.x)I ~\Rightarrow~ (\lambda y.I) = \lambda y.\lambda v.v$$ This terms takes two arguments and this time it ignores the first argument. We should see that $KIAB = (\lambda y\lambda v.v)AB ~\Rightarrow~ (\lambda v.v) B = B$. So we've another {\bf axiom:} $KIAB=B$. We can alpha convert $KI$ to $\lambda x\lambda y.y$. It may appear that terms (such as $K$ and $KI$), which ignore some of their arguments are pointless. But in fact, they correspond to something you're already very familiar as programmers. Given two choices $A$ and $B$, $K$ {\em chooses\/} $A$ and $KI$ chooses B. We are now beginning to see how lambda terms correspond to constructs found in every programming language, in this case Boolean expressions and {\bf if-else.\/} But let's first look at an example involving $S$: $$ SKI = (\lambda x\lambda y\lambda z.x~z~(y~ z))~K~I ~\Rightarrow (\lambda y\lambda z.K~z~(y~z))~ I ~\Rightarrow \lambda z.K~z~(I~z) ~\Rightarrow \lambda z.z$$ In the last step, we applied the axiom $KAB=A$ where $A$ is $z$ and $B$ is $(I~z)$. Alternatively, we can first reduce $(I~z)$ to $z$ with the axiom $IA=A$, but the result will be the same. \section{More Powerful Combinators} When Church invented the lambda calculus, he wasn't just thinking about computing. There was a hypothesis (called ``Hilbert's Program'') that all mathematical problems can be algorithmically solved. Eventually, this hypothesis was proven to be false by Kurt Godel and his {\em incompleteness theorems.\/} But many (though not all) mathematical properties can be described by the lambda calculus. The most basic computational operations are adding and multiplying numbers ... \subsection{Church Numerals} Church formulated a representation of natural numbers as lambda terms: zero is $\lambda f\lambda x.x$, one is $\lambda f\lambda x.f~x$, two is is $\lambda f\lambda x. f~(f~x)$, three is $\lambda f\lambda x. f~(f~(f~x))$, and so on. Why did he write numbers this way? Because he wanted to study the {\em algorithmic\/} properties of numbers. The ``meaning'' of a number is, to him, defined by what we do with them, i.e., arithmetics. With this representation of numbers, basic arithmetic operations can be defined as lambda terms: \begin{itemize} \item $PLUS = \lambda m\lambda n\lambda f\lambda x.m~f~(n~f~x)$ \item $TIMES = \lambda m\lambda n\lambda f\lambda x.m~(n~f)~x$ \end{itemize} Other operations are also possible (the one for subtraction is pretty hard). For example, we can show that $1+2=3$ using beta-reduction: $$(PLUS~1~2) = \lambda f\lambda x. 1~ f~(2 ~ f~x) ~\Rightarrow \lambda f\lambda x. 1~f~(f~(f~x)) ~\Rightarrow \lambda f\lambda x. f~(f~(f~x)) = 3$$ I first substituted $1$ and $2$ for $m$ and $n$ respectively, then reduced $(2 ~ f~x)$ to $f~(f~x)$. Plus works by substituting one number into the end of the other number. Times works by replacing each $f$ with more $f's$. As an exercise, you can verify that $(TIMES~2~3)=6$. We won't go further into Church numerals except to say that all basic arithmetic operations can be represented as pure lambda calculus. Even though in actual languages we don't use lambda terms to represent numbers, lambda calculus provides a theoretical unity to different kinds of computation. \subsection{Booleans and If-Else} A programming language needs more than just numbers and arithmetic. Church's philosophical approach to thinking about numbers extends to truth and falsehood. What is the significance of true and false from a purely computational standpoint? It depends on what we can do with true and false. As programmers, you certainly know what to do with these booleans: we use them to make decisions. From this algorithmic standpoint, true and false just represent two opposing choices: \begin{itemize} \item $TRUE = K = \lambda x\lambda y.x$ \item $FALSE = KI = \lambda x\lambda y.y$ \end{itemize} Then the familiar {\em if-else\/} construct can be defined by: \begin{itemize} \item $IFELSE = \lambda c\lambda a\lambda b.(c~a~b)$ \end{itemize} IF-ELSE takes three (Curried) arguments: a boolean expression $c$ that should reduce to $TRUE$ or $FALSE$, and two choices $a$ and $b$. Since $Kab=a$ and $KIab=b$, true {\em chooses\/} a and false {\em chooses b}. Thus $(IFELSE~ TRUE ~1 ~2)$ will beta-reduce to $1$ and $(IFELSE~ FALSE ~1 ~2)$ will beta-reduce to $2$. The lambda calculus should start to look less strange to you now ... What about the boolean operations and, or and not? All of these can be defined in terms of if-else. For example: \begin{itemize} \item $NOT = \lambda a.(IFELSE~ a ~FALSE ~TRUE)$ \item $AND = \lambda a\lambda b.(IFELSE~a~b~FALSE)$ \end{itemize} That is, $NOT$ is defined as a function that takes $a$ as an argument and has body \verb+if (a) return false; else return true+. $AND$ is a function that takes $a$ and then $b$ with body \verb+if (a) return b; else return false+. For example, $(AND~ TRUE ~ FALSE)$ beta-reduces to (IFELSE ~TRUE~ FALSE~FALSE), which reduces to FALSE. If you're not convinced, do truth tables to see that these pure lambda terms do indeed have the expected behaviors of the boolean operations. As an exercise, define $OR$. \subsection{Data Structures and Encapsulation} How do we {\em construct\/} a single structure from multiple components, and how do we {\em destruct\/} (break down) such a structure into its components? This sounds like a challenge to lambda calculus, but all we need to do is to represent a pair of values, which we normally write as $(a,b)$. Then, using nested pairs, we can construct data structures of any complexity. For example, a linked list is just a sequence of nested pairs ending in some designated value representing the empty list: $(a,(b,(c,(d,null))))$. For historical reasons the three operations we define are called {\em cons,\/} {\em car\/} and {\em cdr:} \begin{itemize} \item $CONS = \lambda a\lambda b\lambda c.(c~a~b)$ \item $CAR = \lambda p.(p~TRUE)$ \item $CDR = \lambda p.(p~FALSE)$ \end{itemize} Given two terms $A$ and $B$, $(CONS~A~B)$ returns (beta reduces to) a pair represented by $\lambda c.(c~ A~ B)$. {\em ``This is not a pair, it's a function!''\/} says you. Yes, but this function has all the behaviors we want from a pair. The argument $c$ is expected to be a boolean value and recall that $(c~a~b)$ is equivalent to $(IFELSE~c~a~b)$: true selects $a$ and false selects $b$. Given such a ``pair'' represented by $P=\lambda c.(c~ A~ B)$, \ $(CAR~P)$ reduces to $(P~TRUE)$, which then reduces to $(TRUE~A~B)$. Thus $(CAR~P)$ returns $A$. $(CDR~P)$ will similarly return $B$. $CONS$ constructs the structure and $CAR$, $CDR$ destruct it. To represent linked lists, we also need to choose a term for the empty list (typically called {\em null\/} or {\em nil:}) as well as a way to determine if a pair is empty or not. \begin{itemize} \item $NIL = \lambda x.\lambda y.y$ \ (same as $FALSE$ and zero). \item $ISNIL = \lambda p.p ~(\lambda x\lambda y.\lambda z.FALSE)~ TRUE$ \end{itemize} $(ISNIL ~NIL)$ reduces to $NIL~(\lambda x\lambda y.\lambda z.FALSE)~ TRUE$, which reduces to TRUE because NIL ignores the first argument and returns the second ($KIAB=B$). But given a non-empty pair $P=\lambda c.(c~A~B)$, $(ISNIL~P)$ reduces to: $$P~(\lambda x\lambda y.\lambda z.FALSE)~TRUE ~\Rightarrow (\lambda x\lambda y.\lambda z.FALSE)~A~B~TRUE ~\Rightarrow FALSE.$$ Now we can define a linked list as in $$M = (CONS ~2 ~(CONS~3~(CONS~5~(CONS~ 7~ NIL))))$$ and write expressions such as $(CAR~(CDR~ M))$, which extracts the second value from the list (verify that it reduces to 3). We can also write $IFELSE ~(ISNIL~ M) ~A~ B$. \ Add ``syntactic sugar'' to make it look like your favorite language. Once we have pairs we can build other structures of arbitrary complexity. For example, binary tree nodes can be represented by $(CONS~X~(CONS ~LEFT~RIGHT))$. \medskip Representing data structures using functions is not so strange once you realize that this is exactly what we want when constructing an {\em ``abstract data type.''} For example, we often construct a {\em class\/} that contains private values and public access functions. The public functions form the {\em interface\/} to the data type: the values that make up the data structure are thus {\em encapsulated.} \subsection{Repetition (Recursion)} We need another component to create a basic programming language: recursion. The usual loop constructs of ordinary languages can be seen as special cases of recursion. The idea is to create a {\em fixed point operator\/} $fix$ with the behavior that $fix~M$ reduces to $M~(fix~M)$. This will allow us to repeat arbitrarily many $M$s, or form loops that repeatedly evaluates $M$. Combined with $IFELSE$, we can terminate the loop when some condition is reached. \begin{itemize} \item $FIX = \lambda m.(\lambda x.m~(x~x))~(\lambda x.m~(x~x))$ \end{itemize} This term is similar to the term we saw earlier that reduces to itself and has no normal form. We verify that: $$ (FIX~M) \Rightarrow (\lambda x.M~(x~x))~(\lambda x.M~(x~x)) ~\Rightarrow~ M~((\lambda x.M~(x~x))~(\lambda x.M~(x~x))) = M~(FIX~M) $$ If it ever seemed strange to you that you can define a function that refers to itself, the $FIX$ operator doesn't actually do that: it's applied to a function that {\em names\/} the recursive function using a $\lambda$-binder. The following recursive program computes the sum of all numbers in a linked list: $$SUM = FIX~ (\lambda f.\lambda n.IFELSE~~ (ISNIL~ n)~~ ZERO ~~(PLUS~ (CAR~ n)~ (f~ (CDR~n)))$$ The recursive function is called by the $\lambda$-bound variable $f$. To see how recursion works in more detail, here $M$ is $\lambda f.\lambda n.IFELSE~ (ISNIL~ n)~ ZERO ~(PLUS~ (CAR~ n)~ (f~ (CDR~n))$ and $SUM = FIX~M \Rightarrow M~(FIX ~M)$. For example, $$SUM~ (CONS~2~NIL) \Rightarrow M~(FIX~M)~ (CONS~2~NIL) \Rightarrow$$ $$IFELSE~(ISNIL~ (CONS~2~NIL))~ZERO ~ (PLUS~ 2~(FIX~M ~NIL)) \Rightarrow$$ $$ (PLUS~ 2~(FIX~M ~NIL)) \Rightarrow (PLUS~ 2~(M~(FIX~M) ~NIL)) \Rightarrow (PLUS~2~0)\Rightarrow 2$$ The recursion stoped and returned $0$ when $(ISNIL~NIL)$ reduced to $TRUE$. We note that all elements of this program are definable in pure lambda calculus. \section{Order of Evaluation} As examples such as $SKI$ show, there are often multiple beta-redexes inside a term and there are different strategies for selecting which one to reduce first. Most popular programming languages use {\em call-by value\/} evaluation, which means that the actual arguments to a function are fully evaluated before they're passed to the function. The alternative evaluation method is {\em call-by-name\/}, which always reduces the outermost beta redex first. Consider $S(KI)K$. If we reduced $(KI)$ first, it would be call-by-value, and if we first substituted $(KI)$ into $S$, it would be call-by-name. Either strategy will give us the same normal form $\lambda z.\lambda y.z$ (which is alpha-equivalent to $K$ again - verify this normal form as an exercise). {\flushleft\bf The Church-Rosser Theorem:} {\em If $A\Rightarrow_\beta B$ and $A\Rightarrow_\beta C$, then there is a term $D$ such that $C\Rightarrow_\beta D$ and $B\Rightarrow_\beta D$.} In particular, if a term can be reduced to a normal form, then it must be unique. This property is also called {\em confluence\/} and it says that all reduction strategies are valid, though not equally efficient: some strategies may require more steps than others. Consider $\lambda x.(x~x~x~x~x~x~x)~ (SKI)$. Using call-by-value, we only have to reduce $SKI$ to a normal form once, but using call-by-name, we will have to reduce it six times. This example suggests that call-by-value is more efficient. But that's not always true: consider $$(\lambda y.z) ((\lambda x.x~x)~(\lambda x.x~x))$$ Using call-by-name, the term reduces in one step to $z$, because $y$ is a vacuous binder. How many steps will call-by-value take? Find the answer yourself, but don't take forever :). {\flushleft\bf Theorem: } If a term can be reduced to a normal form, the normal form can be reached using the call-by-name strategy. Redundant computations in call-by-name can also be avoided with more sophisticated implementations: if we used directed acyclic graphs {\em (DAGs)\/} instead of {\em trees\/} to represent terms, then redundancies such as in $\lambda x.(x~x~x~x~x~x~x)$ can be avoided. The main difficulty in implementing call-by-name is that we must {\em delay\/} computation until it's actually needed. This means we can't just pass a value to a function, but must pass unexecuted {\em code\/} to a function. Furthermore, the code must carry the environment under which it's defined in order to be executed correctly: it's called a {\em closure.\/} Most conventional languages use call-by-value. A notable exception is Haskell. Some newer languages (Scala) now allow functions to be defined to use either call-by-value or call-by-name. Most conventional languages also do not reduce terms that are hidden inside $\lambda$ abstractions: that is, in $(\lambda x. (\lambda y.y)~x) A$, the beta-redex inside the $\lambda x$ term, $(\lambda y.y)~x$ is not evaluated until the outer redex is reduced first. In other words, we don't evaluate a function until it's passed all of its arguments. Also, certain built-in constructs must use call-by-name. If we think of $ifelse$ as a function of three arguments then under call-by-value $(ifelse ~c~a~b)$ will evaluate both $a$ and $b$ regardless of the truth value of $a$, and it is clearly not how we want if-else to behave. Thus, from a practical perspective, the $IFELSE$ combinator we defined earlier can only be used with call-by-name. Thus in most conventional languages, constructs such as if-else (and short-circuited booleans) are implemented as special constructs and not just as built-in functions. To simulate the effect of call-by-name in the call-by-value setting of conventional languages, we have to encapsulate $b$ and $c$, the terms that cannot be evaluated eagerly, inside vacuous $\lambda$-abstractions: \begin{itemize} \item $ifelse = \lambda c\lambda a\lambda b.(c~ a ~ b) ~I$ \end{itemize} Instead of calling $(IFELSE ~ C~A~B)$, we call $(ifelse~ C ~ (\lambda x.A) ~ (\lambda x.B))$. Here, $x$ is assumed to not appear free in $A$ or $B$ (alpha-convert otherwise). The dummy (vacuous) $\lambda x$ delays the evaluation of $A$ and $B$. They are not evaluated until $(c~a~b)$ is applied to $I$. The FIX combinator requires a similar treatment. The FIX combinator is also called the $Y$ combinator: \begin{itemize} \item {\em Call-by-value FIX combinator:\/} $Y = \lambda m.(\lambda x.m~(\lambda y.x~x~y))~(\lambda x.m~(\lambda y.x~x~y))$ \end{itemize} We will need these versions of the combinators in our implementation of pure lambda calculus in real programming languages, such as Python and Perl. \section{Static Scoping} A critical subject we now address in lambda calculus concerns the third C program in the introduction. This program shows the difference between static (or lexical) scoping and dynamic scoping. C, like most languages, is {\em statically\/} scoped. The third C program prints $1$ because the free variable $x$ in the body of the function $f$ refers to the static context in which $f$ was {\em defined.\/} With dynamic scoping, $f$ would refer to the nearest declared $x$ in its {\em runtime\/} environment, which would be the $x$ declared in main. Why is C (and most languages) statically scoped? There are theoretical and practical ways to explain this. Theoretically, the lambda calculus shows that it should definitely be statically scoped. When we reduce $(\lambda x.M)~N$, $x$ is ``bound'' to $N$ and then evaluated in $M$. We can define a {\em let\/} construct that binds variables to values in this way: \begin{itemize} \item $(let~ x=A~ in~ B) ~=~ (\lambda x.B)~A$ \end{itemize} We can now rewrite the third C program as follows: $$ let~x=1 ~in~ (let~ f=(\lambda y.x+y) ~in~ (let ~x=2~ in~ (f~0))) $$ This translates into the $\lambda$-term $$ (\lambda x. (\lambda f.(\lambda x.f~0)~ 2)~ (\lambda y.x+y))~ 1 $$ Clearly, the $x$ inside $\lambda y.x+y$ refers to the outermost $\lambda x$. The inner $\lambda x$ is a vacuous abstraction: this $x$ is not used anywhere. This term beta-reduces to $1$ (verify this yourself). The scoping rules of lambda calculus are those of static scoping. \section{The Undecidability of the Halting Problem} To complete this introduction, we prove in pure lambda calculus the most important theoretical result in computer science: the Halting Problem. Programs can take other programs as input and attempt to analyze their behavior: this is called {\em static analysis.} Examples of such programs include compilers and malware detectors. The Halting Problem is: does there exists a program that, given the source code of any program {\em as well as its input,\/} will return true if that program terminates (halts) on that input, and false if that program does not terminate. The problem is {\em undecidable\/} because there cannot be such a program: the very existence of such a program will lead to a logical contradiction. The Halting problem can be formulated in terms of Turing Machines as well as the (untyped) lambda calculus and several other equivalent formal systems. The {\em Church-Turing Thesis\/} is that all algorithms can be formulated as Turing machines, or as pure lambda terms. Therefore, the following proof is valid for all languages that claim to be ``Turing Complete:'' The proof is by contradiction ({\em reductio ad absurdum\/}): we make an assumption and show that the assumption leads to an absurdity or paradox. \begin{enumerate} \item Assume that there is a pure lambda term $HALT$ such that, given any lambda terms $P$ and $A$, $(HALT ~P~A)$ reduces to $TRUE$ if $(P~A)$ is reducible to a beta normal form, and that $(HALT ~P~A)$ reduces to $FALSE$ if $(P~A)$ cannot be reduced to a normal form. \item Let $INF = (\lambda x.x~x)~(\lambda x.x~x)$. \ Recall that this term has no normal form: it does not halt. \item Let $Q = \lambda p. (IFELSE~(HALT~p~p) ~~INF~~ I)$. \ \ That is, if $p$ halts on itself as input, go into an infinite loop with $INF$, otherwise, return a normal form with $I$. \item Consider the question {\em DOES $(Q ~Q)$ HALT?.\/} That is, is $(Q~Q)$ reducible to a normal form? Based on the assumption that $(HALT~ Q~Q)$ returns $TRUE$ if $(Q~Q)$ halts and $FALSE$ otherwise, we see from the definition of $Q$ that \begin{quote} {\em if $(Q~Q)$ halts then $(Q~Q)$ does not halt, and if $(Q~Q)$ does not halt then it halts.\/} \end{quote} \item We've reached a contradiction in that $(Q~Q)$ halts and does not halt. The terms for $IFELSE$, $TRUE$, $FALSE$, $I$ and $INF$ are all pure lambda terms. The only other term we used was $HALT$ and its assumed behavior. Therefore, the assumption that $HALT$ exists must be false. \end{enumerate} The Halting Problem defines the limits of computation based on the Church-Turing model. Many other undecidable problems are proved by reducing them to the halting problem. In practical terms, the undecidibility of this problem means that we cannot always determine how a program will behave {\em even if we know its input\/} without actually running the program. But running the program could result in an infinite loop. This means we can't always experimentally determine if there are errors in our programs. Obviously, these results are relevant to the design and use of programming languages. \section{Pure Untyped Lambda Calculus in Python} Most untyped scripting languages support untyped lambda terms directly. We summarize the creation of a basic programming language from pure lambda calculus by implementing it using pure lambda terms in Python. \begin{verbatim} # Pure Lambda Calculus in Python (works in both Python2.2+ and Python3) ## The syntax of Python differs from traditional lambda calculus in that # application is written f(x) instead of (f x), and a : is used for . I = lambda x:x K = lambda x:lambda y:x S = lambda x:lambda y:lambda z:x(z)(y(z)) # Note: it would not be correct to define K as lambda x,y:x, because that's # using python's built-in pairs. Every lambda-abstraction takes exactly # one argument, although that argument can be a tuple. To be faithful to # the build-from-scratch approach, we should use Curried functions and # not rely on built-in features as much as possible: f(a,b) is not the same # as f(a)(b). true = K false = K(I) ifelse = lambda c:lambda a:lambda b:c(a)(b) if_else = lambda c:lambda a:lambda b:c(a)(b)(I) # simulates call-by-name NOT = lambda a:a(false)(true) print(ifelse(NOT(true))(1)(2)) ### prints 2 # Church numerals ZERO = false ONE = lambda f:lambda x:f(x) PLUS = lambda m:lambda n:lambda f:lambda x:m(f)(n(f)(x)) TIMES = lambda m:lambda n:lambda f:lambda x:m(n(f))(x) # linked lists CONS = lambda a:lambda b:lambda c:c(a)(b) CAR = lambda p:p(true) CDR = lambda p:p(false) NIL = ZERO ISNIL = lambda p:p(lambda x:lambda y:lambda z:false)(true) # applicative order FIX combinator for recursive definitions FIX = lambda m:(lambda x:m(lambda y:x(x)(y)))(lambda x:(m(lambda y:x(x)(y)))) Primes = CONS(2)(CONS(3)(CONS(5)(CONS(7)(CONS(11)(NIL))))) print(CAR(CDR(Primes))) ### prints 3, the second prime ### functions to convert Python numerals to Church numerals: def Church(n): # converts non-neg integer n to Church numeral if (n==0): return ZERO elif n>0: return PLUS(ONE)(Church(n-1)) # def Unchurch(c): # converts Church numeral c to Python integer: return c(lambda y:1+y)(0) # ## for example, Unchurch(TWO) beta-reduces to # (lambda f:lambda x:f(f(x)))(lambda y:1+y)(0), which reduces to 1+1+0=2 # Unfortunately, Python cannot print intermediate lambda terms: it does # does not support "higher order abstract syntax". C25 = Church(25) # pure lambda representation of 25 print(Unchurch(C25)) #### prints 25 ## pure lambda function to add all numbers in a linked list: SUM = FIX(lambda f:lambda n:if_else(ISNIL(n))(lambda y:ZERO)( lambda y:PLUS(CAR(n))(f(CDR(n))))) L = CONS(Church(1))(CONS(Church(2))(CONS(Church(4))(CONS(Church(8))(NIL)))) ANSWER = SUM(L) print(Unchurch(ANSWER)) ### prints 15 # end of program \end{verbatim} \section{Loose Ends and Typed Lambda Calculus} There are some important items about the lambda calculus that I did not cover, including {\em ``eta-reduction:''\/} this rule just states that $A = \lambda x.A~x$. For example, the term for Church mulitplication, $\lambda m\lambda n\lambda f\lambda x.m~(n~f)~x$, can also be written equivalently as ($\eta$-reduces to) $\lambda m\lambda n\lambda f.m~(n~f)$. Most importantly, we did not cover {\em typed\/} lambda calculus. A typed program is a logically consistent program: a value cannot have conflicting types. Types can also be abstracted over (made generic). The type of a function is written $a\rightarrow b$ where $a$ is the type of the argument ($\lambda$-bound variable) and $b$ is the type of the body. $I=\lambda x.x$, for example, has generic type $a\rightarrow a$ and $K=\lambda x\lambda y.x$ has generic type $a\rightarrow b \rightarrow a$, with the $\rightarrow$ operator on types associating to the right. Variables are associated with types that they're assumed to have, using notation such as $x:int$. From a set of assumptions we derive a type on the right of the $\vdash$ symbol using the following rules: $$ \infer{x:a,\ldots \vdash x:a}{} \qquad \infer{\ldots \vdash \lambda x.M: a\rightarrow b} {x:a,\ldots \vdash M:b} \qquad \infer{\ldots \vdash (M~N): b} {\ldots \vdash M: a\rightarrow b &\quad \ldots \vdash N:a} $$ You should be able to derive the types for $I$ and $K$ using the first two rules. The last rule correspond to the classical syllogism {\em modus ponens\/} and establishes the correspondence between types and logic. That is, if $\rightarrow$ can be read as {\em implies\/} then the type of every combinator must be a propositional {\em tautology\/} because the rules of typing are isomorphic to the rules of logic (this is called the {\em Curry-Howard Isomorphism\/}). The following is a sample type derivation using the rules: $$ \infer{\vdash \lambda x\lambda y.y~x : a \rightarrow (a\rightarrow b) \rightarrow b} {\infer{x:a \vdash \lambda y:y~x : (a\rightarrow b) \rightarrow b} {\infer{x:a, y:a\rightarrow b\vdash (y~x) : b} {\infer{y:a\rightarrow b\vdash y:a\rightarrow b}{} &\quad \infer{x:a \vdash x:a}{}}}} $$ You can verify that the derived type is a tautology by constructing a truth table. An important consequence of introducing types is that beta-reduction (and alpha/eta conversion) must preserve the types of terms: {\flushleft\bf Theorem: Type Soundness.} {\em If $\vdash s:A$ is a valid type derivation for lambda term $s$ and $s\Rightarrow_\beta t$, then $\vdash t:A$ is also a valid type derivation.} This core result (also referred to as {\em subject reduction\/}) is what we mean by a computation being {\em type safe:\/} 3+1 is an expression of type int, and 4, the normal form, is also an int. So type checking can be done before, during, or after beta-reduction. If it's done before reduction, it's called {\em static typing.\/} Another consequence of enforcing typing rules is that certain terms, such as $\lambda x.(x~x)$, is not typable (does not have a type). To encode a programming language that allows for recursion and repetition, the fixed pointer operator, with property $FIX~M = M~(FIX ~M)$, must be imported as an extra constant. If we are to give $FIX$ a type, it would have to be $(a\rightarrow a)\rightarrow a$. This is a {\em not a tautology,\/} so no pure lambda term can have this type. \end{document} \medskip The lambda calculus is the universal language of the programming language research community. Hopefully this document has introduced you to a new way to think about programming and programming languages. %%%%%%%%%%%%%%% Now let's look at the term $$ \lambda x.(\lambda y.(\lambda x.y+x)~3)~x+2$$ This example roughly corresponds to the first C program in the introduction. There are two beta-redexes: the inner $(\lambda x.y+x)~3$ and the outer $(\lambda y.(\lambda x.y+x)~3)~x+2$. There are two correct ways to reduce this term. The first is to reduce the inner term first, which gives $y+3$; then $\lambda x.(\lambda y.y+3)~x+2$ reduces to $\lambda x.x+2+3$. The second way is to first apply alpha conversion \section{Static Single Assignment Form} The lambda calculus is {\em stateless,\/} which means that the same term will always reduce to the same normal form. But what if we allowed {\em destructive assignment\/} in our language: the infamous $x=x+1$? Certainly computing this expression twice will have a different effect than computing it once. Such an opertion is not stateless. However, this does not necessarily mean that every program that contains such assignments are not {\em admissible\/} in lambda calculus. In fact, many modern compilers will first translate a program into an intermediate language called {\em Static Single Assignment Form\/} (SSA). A new variable is introduced for each destructive assignment, so $x=x+1$ is translated into something like $x_2=x+1$. Loops containing such statements are translated into {\em tail\/} recursive functions where $x$ is an argument. The popular {\em LLVM\/} intermediate language is based on SSA. The first C program on page 2, which contains a destructive assignment $y=y+x$, is equivalent to the following pure lambda term: $$ f= \lambda x. (\lambda y_2.y_2+x) ((\lambda y.(\lambda x.y+x)~2)~1) $$ The destructive assignment to $y$ has been replaced by a {\em single\/} binding to the variable $y_2$. We see that $(f~4)$ reduces to $(\lambda y_2.y_2+4) ((\lambda y.(\lambda x.y+x)~2)~1) ~\Rightarrow~$ $(\lambda y_2.y_2+4) ((\lambda y.y+2)~1) ~\Rightarrow~$ $(\lambda y_2.y_2+4) (1+2) ~\Longrightarrow~ 7$.