Baking the Y Combinator from scratch, Part 1
It's a pedagogical pet peeve of mine when technical concepts are simply listed in their fully developed state, with little in the way of motivation, derivation, or historical context. This, I think, is like publishing a cookbook filled with beautiful pictures of cakes, but no recipes.
My aim with the "Baking <x> from scratch" series is to describe not only what, but how and why, with the hope that this will lead to a richer and more durable understanding.
The Famous Y combinator
This post is about the Y combinator. Not Y Combinator the accelerator, but the mathematical construct that it's named after - the Y combinator.
The Y combinator looks like this:
$$ Y = \lambda f. \enspace (\lambda x. \enspace f(x \enspace x)) \enspace (\lambda x. \enspace f(x \enspace x)) $$
Confronted with this definition, you might ask yourself:
- Why does it look like that?
- What is it used for?
- Why do people still talk about it?
- (What the heck is a combinator?)
The one-line answer to all of these is: "the Y combinator implements recursion in functional languages that don't allow for explicit self-reference", but it's worth going a little deeper than that. In my opinion, the easiest way to understand what the Y combinator is, at a fundamental level, is to focus on how it produces fixed points. The easiest way to understand why it's important is to focus on how it can be used to implement recursion. In Part 1 we'll do the former (answering 1. and half of 2. in the process) and in Part 2 we'll do the latter (answering 3. and the other half of 2.)
(We'll answer bonus question 4 later in this article!)
The Y combinator was devised by Haskell Curry, an influential logician who has the unique distinction of lending his name to both a programming technique and a programming language. It is also known as the "Fixpoint Combinator" (for reasons we'll soon discuss) and the "Paradoxical Combinator" (for reasons we'll investigate in the second half of this post.)
The Lambda Calculus
This post presumes that you're at least familiar with the lambda calculus - you don't have to be an expert, but if it's completely new to you you may have some trouble knowing what's going on. This is a good introduction if you need it (this article only draws on the material in Section 1.)
Two things about the lambda calculus that you should understand before reading this:
1) everything in the lambda calculus is a function (I'll also call these functions 'terms'.) When you "run" a program in the lambda calculus, you're basically plugging functions into other functions over and over until you can't do it anymore. This may seem like it severely limits the expressive power of the lambda calculus, but consider that we can encode different kinds of data and operators in the lambda calculus by representing them using suitable functions (the linked paper shows how to represent booleans and natural numbers in the lambda calculus, and demonstrates how to do arithmetic and some boolean algebra.) The lambda calculus is Turing Complete, so using this type of encoding you can replicate the behavior of any program written in any language.
2) There are no names besides variable names in the lambda calculus. You can't assign a name to a function and you can't call any functions by name. For the sake of readability I'll sometimes cheat and give a name to a function just to make it easier to explain. For example, I might write
$$id = \lambda x. x$$
and then later:
$$myCoolTerm = \lambda x. x \enspace id(x)$$
Just remember that you technically can't do \(name = term\) or \(name(argument)\) in the real lambda calculus; you have to write everything out. For example, \(myCoolTerm\) can only legally be written as:
$$ \lambda x. x \enspace ((\lambda x. x) (x)) $$
Y - the Fixpoint Combinator
A fixed point (or fixpoint) of a function \(f\) is a point that doesn't change when you apply \(f\) to it. In other words, \(x\) is a fixpoint of \(f\) if and only if
$$f(x) = x$$
Y is a fixpoint combinator (notice I didn't say the fixpoint combinator - the Y combinator is one of the oldest and simplest fixpoint combinators, but there are many others!) A fixpoint combinator takes a (one-argument) function and returns some fixpoint of that function (some functions have more than one fixpoint - consider, for example, that every value returned by an idempotent function is a fixpoint! If the function has no fixpoints \(Y(f)\) doesn't terminate.) So we know Y must satisfy:
$$ f(Y(f)) = Y(f)$$
but that also means:
$$Y(f) = f(Y(f))$$
From a computational perspective, this second equation says that the action of \(Y\) is to take a function and nest itself in a call to that function.
Because the two equations above are equivalent, finding a \(Y(f)\) that satisfies the second equation means that - by definition - \(Y(f)\) is a fixed point of \(f\).
Now is a good time to remember that everything in the lambda calculus is a function. So \(Y\) is a function that takes another function \(f\) and returns a fixed-point, which is also a function itself.
At first glance \(Y\) looks trivially easy to implement. Can't we just say that
$$Y = \lambda f. f \enspace (Y\enspace f)$$
? Well, if we were using another language, we could, but in the lambda calculus we can't refer to \(Y\) by name. Remember, functions technically don't have names at all in the lambda calculus; the formal calculus uses only lambdas and variables. So, if we want \(Y\) to be a valid term, we have to write everything out. Let's substitute that occurrence of \(Y\) in the function body with its explicit definition:
$$Y = \lambda f. f \enspace (Y\enspace f)$$
$$Y = \lambda f. f \enspace (\lambda f. f \enspace (Y \enspace f)\enspace f)$$
Hmm....let's try again:
$$Y = \lambda f. f \enspace (\lambda f. f \enspace (\lambda f. f \enspace (Y\enspace f) \enspace f)\enspace f)$$
Okay, it's becoming clear we can't write out \(Y\) explicitly: this process will continue forever, and terms in the lambda calculus are not allowed to be infinitely large.
Since we can't get away with having \(Y\) refer to itself in its own definition, we need it to be somehow self-replicating. \(Y(f)\) must manufacture a copy of itself at runtime and then wrap that copy in another \(f\). But now we have a nasty chicken-and-egg problem: it's not clear how to replicate \(Y(f)\) without knowing its structure, and we won't know its structure until we figure out how to do the replication!
Let's forget about the outer \(f\) for now and consider how we could make the simplest possible self-replicating term (call it \(R\)). The first thing to notice is that we want something that reduces - an \(R\) that just sits there will always be trivially equal to itself, but that's not interesting or useful to us. We want a term that changes, but somehow ends up right back where it started. To make any progress, we need to make some claim about the structure of \(R\) (even if it ends up being wrong.) Let's start with the simplest structure that will reduce - a single function application:
$$R = M N$$
Let's see if we can make this work. We need to figure out what \(M\) and \(N\) should be. We're trying to replicate \(R\) by applying \(M\) to \(N\), so if \(R\) is a single function application, the inner structure of \(M\) should probably be a single application too. In other words, \(M\) should look something like:
$$M = \lambda x. ((f_1\space x) (f_2\space x))$$
where \(f_1\) and \(f_2\) are...something. What should they be? What should \(N\) be, for that matter?
We're ready to try and crack the nut of the problem. At this point you should try Messing Around for a while and see what you come up with; experiment with different terms for \(f_1\), \(f_2\), and \(N\) and see what the results are. Remember that everything in the lambda calculus is a function - there are no other types of values! When you've figured it out (or you're just ready to move on), keep reading.
The key to this one is noticing that \(f_1\space x\) must equal \(M\) if we want \(MN\) to reduce to itself. We can't set \(f_1\) itself to \(M\); we'll end up with the infinite nesting we saw earlier and we'll never be able to write \(M\) down at all. Our other option is to pass \(M\) in as an argument, i.e. \(x = M\), which means \(N = M\), which means \(f_1\) and \(f_2\) are simply the identity functions (which means we can omit them altogether.) Put this all together and you get:
$$M = N = \lambda x.(x x)$$
$$R = \lambda x.(x x)\enspace\lambda x.(x x)$$
This works! It only takes a single step for \(R\) to replicate (you can do it in your head if you want - just substitute for \(x\) on the left-hand side.)
We have rediscovered the Omega combinator, or \(\Omega\), which is one of the simplest terms that can reproduce itself like this (you'll notice that it will keep reducing forever!)
Looking back at how we derived the Omega combinator, we can see what the trick to self-replication is. We got things working with \(\Omega\) because we passed a copy of M in as an argument, instead of directly nesting it inside of its own definition. This may just seem like semantics, but it allows us to do the nesting one step at a time during reduction instead of having to write down everything at once - which means we don't end up with infinitely large terms!
Time to put the pieces together. Let's tweak our Omega combinator to turn it into the Y combinator.
Last time, we structured \(M\) based on the structural assumptions we made about \(\Omega\). The only different between \(\Omega\) and \(Y f\) is that \(\Omega\) turns into \(\Omega\), and we want \(Y(f)\) to turn into \(f(Y(f))\). So we'll update the structure of \(M\) - instead of a single function application, it'll be a single function application wrapped in an \(f()\):
$$ M = \lambda x.f (x x) $$
Setting \(M = N\) as before, we get:
$$ Y(f) = M N = \lambda x.f (x x) \enspace \lambda x.f (x x) $$
$$ Y = \lambda f. \enspace (\lambda x. \enspace f(x x)) \enspace (\lambda x. \enspace f(x x)) $$
Does this work? Let's see:
$$ Y(f) = \lambda x.f (x x) \enspace \lambda x.f (x x) $$
$$ Y(f) = f (\lambda x.f (x x) \enspace \lambda x.f (x x))$$
Yes! It works. In a single substitution step, \(Y(f)\) nests itself inside of \(f\). That bizarre structure we saw at at the beginning of this post is indeed a fixpoint combinator!
Y - the Paradoxical Combinator
So far we've learned what the Y combinator does, and we've kind of learned why it's useful, but perhaps not in a very convincing way. So the Y combinator self-replicates without referring to itself by name...why do we care? What do we get by prohibiting self-reference, or by writing programs in such an unwieldy style? Isn't this all a bit abstruse (and while we're at it, how on earth is the Y combinator "paradoxical"?) I can give you a more satisfying answer, but you'll have to allow me a brief detour into the history of math.
The lambda calculus was created in the early 20th century, when mathematics was re-orienting itself around the idea of formalism. Formalism, in a nutshell, is a philosophy that says mathematics is "just" the act of manipulating symbols according to well-defined rules. This may seem tautological, but it's not - it represented a significant shift in attention towards the symbols and rules themselves and away from the things mathematics is "about" (numbers, shapes, functions, etc.)
This new formalist perspective led to a proliferation of formal systems. At its most basic, a formal system is a set of symbols that can be arranged into formulas, a set of rules for manipulating symbols to make new formulas, and a set of axioms - formulas that are assumed to be true. You're probably familiar with a couple of formal systems already, such as the predicate calculus, set theory, or Peano arithmetic.
Formal systems featured prominently in 20th century mathematics, and new ones are still being created today. A well-constructed formal system can serve a wide variety of purposes. For example:
- Many formal systems have been proposed as potential foundations of mathematics - systems in which every area of mathematics can, in theory, be re-derived. A suitable foundational system a) unifies separate areas of mathematics, making it easier to port insights and techniques between them, and b) provides a rock-solid underpinning for concepts that may have been accepted as true without being fully understood. Today it is widely accepted that Zermelo-Frankel set theory is the most suitable foundation for mainstream mathematics, but other systems could potentially work as foundations as well, such as (some variants of) category theory, and even (some variants of) the lambda calculus!
- Formal systems can be used to learn things about a particular style of reasoning, deduction, or calculation (a formal system might be proposed simply so that mathematicians can study its capabilities and limitations - not because it's ever intended to be used in practice.)
- On the flip side, some formal systems are just easier to do mathematics in (this is a matter of taste, of course.) For example, many proponents of category theory maintain that certain results are much more natural to derive categorically than in other systems.
- As computer science matured into its own field of inquiry, there was a lot of work done on functional programming languages (functional languages you may have heard of include LISP, Haskell, and OCaml.) While an imperative program is (loosely) a series of instructions, a functional program is (loosely) a collection of terms and a set of rules for turning terms into other terms. Existing formal systems like the lambda calculus provided a natural jumping-off point for the designers of these new functional languages.
(Some systems are referred to as formal systems and others are referred to as formal systems of logic (or formal logics, for short), but the criteria that distinguish these are blurry and a point of contention among some mathematicians. They also aren't relevant to the aims of this article, so for now we'll treat both of these terms as equivalent.)
All of this is to say that Haskell Curry was (probably!) thinking about the lambda calculus as it was first presented - that is, as a formal system, and not as a programming language.
A few other things that are important to understand about the lambda calculus before we move on:
- it was created to be small. It has variables, abstraction, application, and two reduction rules. That's it - that is the complete language. This minimality serves an aesthetic goal (to capture the "essence" of computation), but it also makes the lambda calculus easy to reason about (take, for example, the Church-Rosser Theorem, which says that regardless of the order in which you reduce terms in an expression you'll always end up with the same value at the end. You would not want to try and prove this fact about a more complicated language.)
- it was first described Church in 1930, more than six years before Alan Turing (Church's doctoral student at the time) would publish his work on Turing machines. At the time there was a broad and ongoing investigation into the nature of computation, but there were no physical computers to speak of and certainly no programming languages. The closest analogues to Church's system were systems of formal logic - it's doubtful anyone imagined or assumed that the lambda calculus itself would be directly used to give instructions to a computer.
- A big theme in 20th century computer science is that computation and formal logic are two sides of the same coin, in a few important ways (we'll look at this more in a bit.)
That said - whence the Y combinator? Well, Curry needed it in order to create a paradox within the lambda calculus.
In the parlance of formal logic, a paradox is a proof that doesn't break any rules, but still ends in a contradiction (i.e. it proves a proposition and its opposite at the same time.) You may be familiar with the famous Russel paradox; the informal version goes like this: "there is a barber who shaves all those, and those only, who do not shave themselves. Does this barber shave himself?" On the one hand, the barber must shave himself - if he didn't, he'd fit the criteria for people he shaves, so he'd end up shaving himself anyway. On the other hand, the barber cannot shave himself - if he did, he would no longer fit his own criteria, and would be unable to take himself on as a customer.
Proving a contradiction is usually a knife through the heart for a system of logic. If you can prove a proposition and its opposite are both true, you can often find a way to prove any statement - and a system that can prove anything is not very useful (logicians refer to this kind of system as trivial.) You're only supposed to be able to prove things that are actually true!
Paradoxes are important tools to probe the flaws and limitations of formal systems. The discovery of Russel's Paradox, for example, led to many augmented systems of logic containing various mechanisms to ensure the paradox could no longer occur (fun fact: among these efforts were the first incarnations of type theory!) In particular, since Russel's Paradox has negation at its core (the barber shaves people who don't shave themselves), many logicians hoped they could dodge it by constructing systems that had a modified concept of negation, or no concept of negation at all.
Curry's Paradox shot those hopes down. It's a variant of Russel's Paradox that doesn't depend on negation. Curry constructed his paradox to prove the triviality of a bunch of different logics - among them, the famous lambda calculus and Moses Schönfinkel's combinatory logic.
Combinatory logic was invented by Moses Schönfinkel in a 1924 paper (it was one of the two papers he published in his whole life.) Schönfinkel's goal was to create a form of logic that did not include the notion of a variable. Instead, it uses a fixed set of combinators - which are functions that take other combinators and apply them to each other in various ways. Just as everything in the lambda calculus is a function, an expression in combinatory logic only consists of combinators - it's combinators all the way down.
Combinatory logic got a boost when Haskell Curry began studying it in 1927, but the lambda calculus was always more popular. Until a resurgence of interest in the 60s, Curry basically carried the torch alone, doing the bulk of the work on the subject either himself or with his students. Curry created the Y combinator to construct his paradox in combinatory logic, which is why it's called the Y combinator and why it's named after a single letter. In combinatory logic, the Y combinator is implemented like this:
$$Y = SSI(S(S(KS)K)(K(SII)))$$
Anyway, enough preamble - let's examine Curry's Paradox in all its glory.
Take a look at this:
"If this sentence - the very one I'm speaking now - is true, then all birds are government spies."
It turns out that using this sentence alone you can prove the fact that all birds are government spies - or anything else you wish to prove, including a contradiction like "1 = 1 and also 1 = 0". This paradox is not quite as intuitive as Russel's Paradox, so we probably need to look at it more formally before it starts to make sense.
Consider the following statement:
\(S\): If statement \(S\) is true, proposition \(P\) is true.
- The standard way to prove an implication \(X => Y\) is to assume that the first part is true and then deduce that the second part must be true, given that the first part is. So, let's take "\(S\) is true" as a hypothesis.
- \(S\) is true, so \(S => P\) is true, since \(S\) states directly that \(S => P\).
(This is where the problems start, by the way. If \(P\) were something provably false, like \(1 + 1 = 3\), \(S => P\) should also be false, but the rules of logic compel us to permit \(S => P\) anyway, since by hypothesis we're assuming \(S\) is true. As with many paradoxes, the issue is that \(S\) references itself.) - \(S\) is true by hypothesis, and \(S\) says \(S => P\) is true. Taken together, this means \(P\) must be true.
- We just proved that if you assume \(S\) is true, \(P\) must be true as well. In other words, we proved \(S => P\).
- \(S\) states directly that \(S => P\), and we just proved that \(S => P\) is true. So, \(S\) is true as well!
- Finally, since \(S\) is true and \(S => P\) is true, \(P\) is true as well - no matter what \(P\) is. Paradox!!
As mentioned, the key here is thay \(S => P\) refers to itself via its own antecedent, \(S\). As we saw before, we'll need the Y combinator to make this happen in systems that don't permit explicit self-reference.
Curry's Paradox in the Lambda Calculus
In this section we're going to put together everything we've learned so far to implement Curry's Paradox in the lambda calculus. This might seem nonsensical to you - what does it mean to implement a paradox? How could a logical paradox be put into a system of computation?
In the last section we learned about the concept of a formal system, and I gave a few examples. Some of these are logical systems, in which you manipulate symbols to make logical deductions, and some are computational systems, in which you manipulate symbols to perform computations and come up with a result. Using a computational system to prove things or do logical deduction doesn't seem to make any sense - it wasn't built for that, right?
The Curry-Howard Correspondence was a slow-motion realization by Haskell Curry and the logician William Alvin Howard (and a few other mathematicians, at various times) that these two "types" of systems are actually, in a deep and important sense, intertwined! In other words, we are writing proofs when we write programs - they're just constructive proofs.
The term "constructive proof" takes its name from constructivism, which, like formalism, is a philosophy of mathematics that was the topic of some debate in the early 20th century. There's a lot to say about constructivism, but the part that's relevant to us is the belief that the only way to prove a certain class of mathematical objects exists is to construct an example and present it.
In particular, this means that constructivists reject proof by contradiction, which is a technique used in "classical" mathematics. A proof by contradiction says "this type of object must exist, because if it didn't, that would lead to a contradiction." You'll notice that the object itself is never described, even though it's been proven to exist; i.e. we know it's "out there" but we don't know what it looks like! For constructivists, this is impermissible.
Proof by contradiction is an important and widespread technique, and outlawing it makes it - frankly - pretty difficult to do mathematics (David Hilbert compared it to "prohibiting the boxer the use of his fists.") For this reason constructivism never quite caught on with mainstream mathematicians the way that formalism did.
To understand what it means for a program to represent a constructive proof, consider the value "hello". This is a value of type \(String\), which means it is an example of a \(String\), which means it is a constructive proof that at least one value of type \(String\) exists.
Now consider the following (pseudocode) function:
$$ firstChar(myString: String) = myString[0] $$
This accepts a \(String\) and returns a \(Char\). In other words, it accepts a constructive proof that a \(String\) exists and returns a constructive proof that a \(Char\) exists. In other words, it is a constructive proof that says "if at least one value of type \(String\) exists, at least one value of type \(Char\) exists".
This extends to higher order functions as well. For example, this function:
$$ firstDigit(firstChar: (String) => Char, toString: (Number) => String, num: Number) = firstChar(toString(num))$$
is a constructive proof that says "if the existence of a \(String\) implies the existence of a \(Char\), and the existence of a \(Number\) implies the existence of a \(String\), and at least one \(Number\) exists, then at least one \(Char\) exists".
For typed languages, the Curry-Howard correspondence implies that types can be seen as propositions, and terms of a particular type can be seen as proofs of the proposition that type represents. This is often referred to as the "propositions as types" perspective.
(The "proofs" embodied by a lot of business software are usually not very interesting - e.g. "if we have a proof that a list of numbers exists, we can prove that a list of strings exists", etc. People can and do write full, mathematically useful proofs using programs, but these programs generally look very different than the ones written by professional software developers.)
The full statement and context of the Curry-Howard Correspondence is more technical than this, but this is the general idea, and enough of a start for us to work with.
Now let's think of another term - a number that is both odd and even at the same time. We'll refer to this kind of number as an \(Evil\) number, i.e. our number is a term of type \(Evil\). This term functions as a constructive proof that there exists at least one \(Evil\) number (which, of course, we know isn't true!) We'll refer to this term as \(evil\) - and since it proves something that isn't true, it can't be constructed directly. But is there a way to express it nonetheless? (spoiler: in the naive lambda calculus, there is)
We can't create \(evil\) directly, but we could return it from a function, if the function had some way to get an \(evil\) from its arguments. It doesn't make sense to directly accept \(evil\) as an argument (how would the caller construct it?) but we could accept a function that returns it. But what should that function accept? Well...another function that also returns \(evil\)! The naive lambda calculus doesn't have types, but if it did, the type of our new function would (informally) look like:
$$ createEvil: createEvil \rightarrow Evil $$
To get \(evil\) from \(createEvil\), we supply it with a function that creates \(evil\) - and our only option is \(createEvil\) itself. The non-lambda-calculus version of this looks like:
$$ createEvil(f) = f(f) $$
and we call it like so:
$$ evil = createEvil(createEvil) $$
(Note that what we've done is to express a term that represents \(evil\). Were we to actually try to run this function, it would run forever. This is not unexpected - paradoxes and nontermination are two sides of the same "things are breaking" phenomenon.)
Let's stop and consider for a second. If you screw up your eyes a little you can see that this weird function is analogous to Curry's Paradox. Consider:
- a function that takes a proof object for \(A\) and turns it into a proof object for \(B\) is a constructive proof of \(A \Rightarrow B\)
- a function that takes that function and turns it into \(evil\) is a constructive proof of \((A \Rightarrow B) \Rightarrow evil\)
- consider a function that takes itself and turns it into \(evil\). If we refer to the proposition this function proves as \(S\), it directly follows that this function also proves \(S \Rightarrow evil\)
This should look familiar to you - it's exactly the structure that caused so much trouble in our earlier example! Let's go through the rest of the paradox to see how it lines up with \(createEvil\)
Logic World | Program World |
---|---|
Consider \(S = (S \Rightarrow evil\)) | Consider \(createEvil\) |
First, we must prove \(S \Rightarrow evil\) | First, we must implement \(createEvil\) |
Assume \(S\) is true, as a hypothesis. Since \(S\) is true, \(S \Rightarrow evil\) is true, because \(S\) states directly that \(S \Rightarrow evil\) | \(createEvil\) takes \(createEvil\) as a parameter, so as we implement \(createEvil\) we can assume we have an instance of \(createEvil\) available to us - i.e. we have a constructive proof of \(createEvil \Rightarrow evil\) |
Since we have assumed \(S\) is true and shown \(S \Rightarrow evil\) to be true as a consequence, that proves \(evil\) is true if \(S\) is true - i.e. we have proven \(S \Rightarrow evil\) | Since we have an instance of \(createEvil\) available to us, we can simply pass it to itself to produce an instance of \(evil\). We have implemented \(createEvil\) and completed our constructive proof of \(createEvil \Rightarrow evil\) |
We have proved \(S \Rightarrow evil\), which also means we have proved \(S\). By combining \(S \Rightarrow evil\) and \(S\), we can deduce \(evil\) is true as well. Paradox!! | \(createEvil\) is a constructive proof of both \(createEvil\) and \(createEvil \Rightarrow evil\). Now that we have implemented \(createEvil\), we can simply write \(createEvil(createEvil)\) to (in theory) produce an instance of the inconstructable \(evil\). Paradox!! |
Hopefully now it's clear that \(createEvil\) is the computational manifestation of Curry's Paradox. We're exploiting self-reference here to produce a function that "proves" \(evil\) exists (it can also prove that any other value exists, just like the "regular" version of Curry's paradox can prove any \(P\).) Yet, since this function never terminates, running it will break your program, just as proving a contradiction will break your logical system.
Now for the coup de grace - let's rewrite \(createEvil\) in the lambda calculus. Remember, we can't do \(createEvil = createEvil(createEvil)\) directly because the lambda calculus technically doesn't allow for named functions. We need some way to pass \(createEvil\) to itself at runtime. This is where the Y combinator (finally) enters the picture. Recall that:
$$ Y(f) = f(Y(f)) $$
if we let \(f = \lambda x. (x x)\), we get:
$$ Y(f) = f(Y(f))$$
$$ Y(f) = (\lambda x.(x x)) \space Y(f) $$
$$ Y(f) = Y(f) \space Y(f) $$
We can see that if we let \(createEvil = Y \enspace \lambda x. (x x)\), we end up with \(createEvil = createEvil \space createEvil\), like we wanted. Thus, the final form of \(createEvil\), fully written out in the lambda calculus, is:
$$ (\lambda f. \enspace (\lambda x. \enspace f(x x)) \enspace (\lambda x. \enspace f(x x))) (\lambda x. \enspace (x x)) $$
What does it mean that we've implemented Curry's Paradox in the lambda calculus? It means that the (naive) lambda calculus has some pretty notable issues as a formal system. How do we deal with that? By adding types, thus upgrading the naive lambda calculus to the simply typed lambda calculus. We'll cover this topic more in Part 2!