An Elementary Introduction to Curry Howard de Bruijn Correspondence

Part ITypes as Propositions

This is an account of how I make sense of the Curry Howard de Bruijn Correspondence. It is a deep correspondence between logical systems and computer programs that was first identified close to 1940s. It is meant to be a primer where I integrate the underlying ideas I have gleaned together. I have kept it as a casual treatment that uses only as much technical jargon as needed to explain the concepts.

It is structured by and large as an account that I wish I had when I first learnt about the fascinating idea that programs are proofs in disguise (and vice versa!). As a user interface designer / developer at the time I didn’t have the capability to understand the dense formal material that were foundational to grasp these ideas.

On my first brush with this correspondence, I was left puzzled on how to start parsing it. From my limited mathematical experience, I couldn’t make sense of it. The whole symbolic tradition in which the ideas were couched and communicated made it opaque for me to see through what was being said.

After some perseverance, I think I have now gotten to a point where I can dive deeper into these materials and start understanding them. I intend to turn these into a journal series marking my explorations. I want to share some of the things I have learned so far in the most accessible way to fellow learners on this journey. I have kept the presentation close to traditional idioms in programming as I think it helps in emphasizing the practical advantages or implications this idea has for the workaday programmer. Although, I have to add that the article is not positioned towards a novice programmer. It assumes a passing familiarity with programming in a typed language, and some basic exposure to logic. I have tried to link to some of the interesting works I found during my spelunking and learning phase in these areas.

With these preliminaries out of the way, let us begin our explorations in unfolding the ideas constituting the Curry Howard de Bruijn correspondence!

What is this? Why should I care?

A natural question you might have on hearing a lofty sounding concept like the Curry Howard de Bruijn correspondence for the first time is, why should I learn about this? And especially something so abstract and theoretical when you might have a ton on your plate to learn already. I hope by the end of this article, you will get to see the implications of why this might be a useful weapon up your arsenal as a programmer who focusses on learning theories with a high bang for the buck ratio.

Curry Howard deBruijn is a deep correspondence between formal systems in mathematics which were previously thought to be unrelated. It can be stated simply, at the loss of some formal precision, as proofs about mathematical objects are dual to computations with data. Proofs in this sense act as evidence / witness for the existence of mathematical objects we compute with and programs are the way to “realize” these objects through concrete data. When reading explanatory sources, it is usually stated as a correspondence between types as propositions, and proofs as programs. It is also underlined that simplification of proofs are equivalent to evaluation of computer programs. That is a dense set of ideas if that’s your first time brushing on them! Let us unpack what is going on in here.

A triangle with Type, Logic, and Computation on its three vertices (clockwise from top) with the accompanying illustration of a semilattice, graph, and monotonic function composition tree respectively. The connection between type and logic is labelled Types as Propositions, the one between type and computation is labelled Type Inhabitation, and the one between logic and computation is labelled Proofs as Programs.

It can be explained using a tripartite correspondence between the three fields involved. As can be seen in the diagram, three different frameworks employed towards three different ends are brought together in this nexus as a mutually complementing triad. This allows for a back and forth travel between three pairs of two worlds!

I Type ↔ Computation

In the world between Type and Computation, in the direction of Computation to Type, it enables inference of types from code base. In the other direction, we get code synthesis via the information in types.

II Computation ↔ Logic

In the world among Computation and Logic, it allows you to prove your programs and program your proofs!

III Logic ↔ Type

In the world between Logic and Type, we get the machinery of logic to work with types and types enable logic to escape from certain antinomies.

Type, Data, Computation, and Logic (with Type on top) arranged as a diamond in clockwise fashion. Type is represented as a semilattice. Data is represented as a tree, Computation is represented as a monotonic function composition tree, and Logic is represented as a graph. There are arrows like in the Klein-4 group between these entities.

For the purpose of this essay, I am introducing a fourth pair that shines light on the features of interaction between these frameworks. That is the category of data which illuminates how different logical formulations of types determine distinctive organizations of structures/processes of computer programs. I will elucidate this connection as we progress.

This way of organizing the diagram allows us to decompose it into two pairs: 1) Type ↔ Computation pair and 2) Logic ↔ Data pair. Note that there’s a rough complementarity between type/logic and computation/data which I have marked out using dual colors above.

These are only two among the six possible pairs of interaction in the above diagram. I single these out because foremost, they are what motivated this post. Secondly, I felt that the insights granted by these pairs help clarify the practical implications this idea has for a workaday programmer. Learning theory and working out their implications is a huge investment of time. I wanted to portray a perspective that might resonate with a practice oriented programmer. I find it to motivate how learning this theory might have an influence on their daily practice. Thirdly, these pairs give a handle to grasp how a duality along one dimension reflects out in the interaction in another one. That is, how type ↔ computation interaction reflects out in logic ↔ data interaction.

Type ↔ Computation Interaction

A semilattice labelled Type on the top and the monotonic function composition tree on the bottom labelled Computation with two arrows going between them. The one from Computation to Type is labelled Type Inference and the one back is labelled Code Synthesis.

In the upwards direction, you get type inference, where algorithms such as Algorithm W enable inferring what types are being used by functions that perform the computation and in the downward direction, it gives us code synthesis from our description of the types.

Logic ↔ Data Interaction

A graph labelled Logic on the left and a tree labelled Data on the right with two arrows going between them. The one from Logic to Data is labelled Determines the computational complexity and the one back is labelled Determines the form of propositions.

In the direction from logic to data, you get different logical representations of the same computation determining distinctive data structures as its content. This can be thought of as shifting the ontology in which computational processes run. And in the direction from data to logic, us choosing an alternative data structure, say a trie instead of a tree for text search, determines the form of proofs that unfold out of the program. This is directly reflected as the types of our programs.

A Leitmotif

I have condensed the above four concepts as four circular pegs and represented it on a board which will be used as a motif throughout this essay. It will aid in designating which part of the correspondence is being detailed in a particular section of the essay.

A diamond shaped dice with four dots in four divisions colored violet with orange dots

Here are the four different configurations corresponding to the four distinct topics that we will address in this essay. I will use the appropriate icon when touching upon that aspect of the CHD correspondence.

The diamond shaped dark purple dice with upper partition having the orange dot

Type

The diamond shaped dark purple dice with the left partition having the orange dot

Logic

The diamond shaped dark purple dice with the lower partition having the orange dot

Computation

The diamond shaped dark purple dice with the right partition having the orange dot

Data

We will cycle through these combinations four times in the course of this post. By the fourth time, if I did my job well, the significance and interplay between these concepts would be clarified.

By the end of this post, at the very least, I hope to have revealed why such a correspondence across multiple formalisms is fascinating for someone trying to understand how disconnected looking strange math/computer science ideas have some deep hidden unity among them!

Synopsis

As a brief panopticon of the whole post, in this section, I will recollect the story behind what spurred me to create this post. I hope to encapsulate the crux of this post in this retrospect of me making these connections.

After this brief look back in this section, we will see how types of computing functions can be seen as logical propositions and how this logical organization can be concretely understood as combinatorial objects in the second and third sections of this essay. In the summary part, we will take a converse approach and show how choosing different data structures can be understood as different refactoring styles for programs, and how this gets reflected as logic of types.

So here is how I stumbled on the idea for this post:

Logic

Hans Halvorson

As a part of honing my logical skills, I was working through the introductory book How Logic Works by Harold Halvorson when I noticed the equivalence between the following formulae:

P -> (Q -> R) <=> (P ∧ Q) -> R

This in logical terms means that we have two equivalent propositions. On the left hand side, we have a proposition P implying another implication (Q -> R) and on the right we have a conjunction of two propositions P and Q implying R.

Type

I recalled from my earlier exposure to Curry Howard de Bruijn correspondence, and the talk and its accompanying blog series called “The Algebra of Algebraic Data Types” by Chris Taylor that X imply Y can be considered in type theory as the exponential type and X and Y as the product type.

Chris Taylor

What the post outlines is that these types denote the count of their inhabitants. Whenever you have P -> Q, it means that it is an exponential type and whenever you have P * Q, it means that it is a product type.

P -> (Q -> R) <=> (P * Q) -> R

This in terms of concrete objects can be thought of as counting the number of possible functions in the case of exponential type and co-occurences of objects in the product type. What does this mean? It can be clarified, if we translate this to its arithmetic counterpart. If we denote exponential type with the raising operation and multiplication for the product type, we can see these closely mirroring the following arithmetic identity about raising numbers to powers:

(X^M)^N = X^(M×N) Example (2¹)³ = 2^1×3

That might be easy to understand in the case of numbers, but how does that relate to the case of computation? Consider the case of two types X and Y with 3 and 2 elements each, with the mapping X -> Y this can be something like 3 cars of 2 colors, 3 plants of 2 species, or even 3 stories of 2 dreams.

Exponential Type

X with 2 elements {a, b} → Y with 3 elements {1, 2, 3}

A tree organized with two leaves X and Y with subscript 2 and 3 respectively, conjoined by → as the parent node. It is labelled Type Level (External View). This is compared against another equivalent visualization with the internal contents of X and Y, {a, b} and {1, 2, 3} visualized as nested capsules contained inside a box with the operator label →. It is labelled Inhabitant Level (Internal View).

This scenario can be seen from two different perspectives. From the outer/external view, we can think of exponentials just like how exponentiation of numbers work. 3 -> 2 mirrors as 2³. That is, externally, the types can be thought of as tracking number of inhabitants as an exponentiation. But if we take the inner/internal view, we witness something really interesting happening!

Inhabitants of Exponential Type

Visualization of possible functions between { a, b } → {1, 2, 3}

{<a,1>,<b,3>}
{<a,1>,<b,2>}
{<a,2>,<b,3>}
{<a,3>,<b,1>}
{<a,2>,<b,1>}
{<a,3>,<b,2>}
{<a,1>,<b,1>}
{<a,2>,<b,2>}
{<a,3>,<b,3>}

What happens internally with exponentiation of types is that this mapping operation tracks potential functions between these two domains! We are counting out the number of functions, that is the number of many-to-one names that can be given for the things we have on the left side (domain) when mapped to a thing on the right (codomain). This is what exponentials in the form of X -> Y denote.

Now we will take a look at what happens with product types. With products whevenever we multiply two types, we track the number of possible co-occurrences of their inhabitants in the outer level!

Here is the visualization of what they result in when we compute the cartesian product operation. The conjunction of types in the external view results in what can be thought of as the collection of all possible pairs of elements that can co-occur internally.

Inhabitants of Product Type

Visualization of product of X := {a, b} * Y := {1, 2, 3}

[a, 1]
[a, 2]
[a, 3]
[b, 1]
[b, 2]
[b, 3]

Computation

Types act as the outer view of internal computational content. Exponential types are how we describe functions, and product types capture the Cartesian product of all possible elements in the conjunct types. Using this line of thinking, what is captured by the logic expressed in types can be thought of as two different forms of descriptions of a computer program.

This is the types as propositions interpretation. Types compose to form propositional statements. In order to prove these propositions, we need to bring in evaluation rules that are harmonious with the properties of our proof system. If we turn our view around and look at this from the angle of computation, the process of proving these propositions parallels how we write computer programs that respect the types! In this manner, roughly, one can think of types as the form and computation as its content.

curriedFunction :: P -> (Q -> R)
                    curriedFunction = pThing => qThing => rThing

<=>

uncurriedFunction :: (P * Q) -> R
                    uncurriedFunction = (pThing, qThing) => rThing

In terms of computation, the exponential type denote a way of organizing programs called currying. In this mode of expression, we pass in parameters one by one to the function. That is, we pass in a value of type P first, which unlocks a new function where you have to pass in a value of type Q to unlock R. Whereas the product type P * Q means a conjunction of parameters, that is both P and Q together as (P, Q) is now being passed as the parameter instead of passing in a chain of types/propositions of P first and then a Q to yield an R.

There is this very nice talk by Sergei on how logic of types can be leveraged to extract their computational content. It was very helpful in clarifying my understanding of how types map to computation for me!

First slide of the talk Introduction to the Curry-Howard Correspondence: https://github.com/winitzki/talks/blob/master/ftt-fp/curryhoward-2018-tutorial.pdf

Sergei Winitzki

Data

All these pieces clicked together and spurred me to write this post as I had been investigating some discrete calculus and combinatorics on the side. These efforts illuminated how the underlying structures can be seen as different ways in which the partitions / combinations described by the types track their inhabitant computational processes.

I will illustrate what happens combinatorially by taking an example where P has 2 elements, Q has 1, and R has 3 elements. This will give us a concrete context to interpret what is happening with the structures. This part is treated as algebraic formulae in many textbooks. But, I think there is so much juicy visual ideas inherent in the combinatorics. I personally feel the algebraic (intensional) textual treatment needs to be complemented by their visual geometric (extensional) counterparts to enrich and illuminate the underlying ideas.

I am increasingly getting convinced that whenever algebra is used to demarcate a form and study its content, however abstract or complex the underlying subject, a geometry can be associated with the algebraic formulae to give it a visual shape and communicable structure. The ways in which we visualize them provide a concrete setting to see how things connect to / distinguish from each other. This makes way for perceiving the form/content of mathematical objects both in our mind’s eye through algebra and their possible geometric instantiations with our bodily eyes. This marriage of algebra and geometry allows us to both perceive and communicate mathematics symbolically as well as tangibly with our peers.

P -> (Q -> R) and (P * Q) -> R and their visualizations at the Type Level (External View) and Inhabitant Level (Internal View). P has 2, Q has 1, and R 3 elements.

At the Type Level for P -> (Q -> R), a tree visualization is shown with P on the left paired with the pair Q and R, all with the -> operator. At the Inhabitant level, nested boxes are shown with 3 orange triangles inside P, a yellow circle inside Q, and 2 purple squares inside R. Q and R are nested inside a box with an -> operator on top. It is nested inside another container with P on the left and another -> operator on top.

At the Type Level for (P * Q) -> R, a tree is shown with P and Q on the left paired with the * operator, which is paired with R using ->. At the inhabitant level, nested boxes with internal details of the types are exposed in the same manner as above, but with * on top of the container of P and Q. This is nested inside another box with R on the right with -> on the top.

This is how the mappings from the outside looks like. We will now evaluate the relationships inherent here and count the number of possible functions that hold between these. Using the intuition given in the types section, we can unfold out and see the number of possible combinations these can potentially spawn.

There are many ways to unfold the above relations. Each way we choose out of the possibilities yield a particular form of expansion. I have chosen one where I was able to maintain them as binary trees while revealing the exponentiation process as leading to a staged expansion where each level is progressively indexed by an inhabitant of the exponent.

The pattern of unfolding here can be seen as 8 nodes generated out of the multiplication of {a, b} three times, but each of these levels are indexed by one of the inhabitant of (P * Q) corresponding to a particular level. This may be easier to parse if we think of this in terms of arithmetic. P and Q have 3 and 1 element each, so this gives 3×1 = 3 nodes. This is then used as the exponent of R which has 2 elements, giving us 2^(3×1) = 8 nodes.

One thing we can readily notice is that at each level of the unfolding, we obtain a branching tree. If we just look at the {a, b} pair which designate 2, we can see that, it is bifurcating at each level yielding the formal series 2¹, 2², 2³. Thus, {a, b} being the inhabitants of R is being indexed at each level by an inhabitant of its exponent P * Q which has 3 elements, one for each stage of the unfolding. If we look at each level, we can see that it is a representation of (a, b) * (a, b) * (a, b) which is indexed at each level consuming one of the element from our (P * Q) pair with 3 elements.

When we expand out fully the algebraic relation among the inhabitants of the type, we get the following normalized results. We can think of these as the entities we obtain when we unfold the tree fully till it bottoms out.

An Unfolding Pattern of (P * Q) -> R

Shows the tree of unfolding the inhabitants of the types (P * Q) -> R which can be described using S-expressions as:

(+
(*
(-> (* 1 I) a)
(+
(* (-> (* 2 I) a)
(+ (-> (* 3 I) a) (-> (* 3 I) b))
)
(* (-> (* 2 I) b)
(+ (-> (* 3 I) a) (-> (* 3 I) b))
)
)
)
(*
(-> (* 1 I) a)
(+
(* (-> (* 2 I) a)
(+ (-> (* 3 I) a) (-> (* 3 I) b))
)
(* (-> (* 2 I) b)
(+ (-> (* 3 I) a) (-> (* 3 I) b))
)
)
)
)

At each level (* 1 I), (*2 I), and (* 3 I) are extracted out of the common pairs to highlight they are recurring and put around a common container of (+ a b). Thus there is only a single (* 1 I), two (* 2 I)s, and 4 (* 3 I)s at each unfolding level of the tree.

The normalized results of unfolding the uncurried function (P * Q) -> R with 2, 1, and 3 inhabitants respectively.

Shows a grid of the results:

((1 * I) -> a) * ((2 * I) -> a) * ((3 * I) -> a)
((1 * I) -> a) * ((2 * I) -> a) * ((3 * I) -> b)
((1 * I) -> a) * ((2 * I) -> b) * ((3 * I) -> a)
((1 * I) -> a) * ((2 * I) -> b) * ((3 * I) -> b)
((1 * I) -> b) * ((2 * I) -> a) * ((3 * I) -> a)
((1 * I) -> b) * ((2 * I) -> a) * ((3 * I) -> b)
((1 * I) -> b) * ((2 * I) -> b) * ((3 * I) -> a)
((1 * I) -> b) * ((2 * I) -> b) * ((3 * I) -> b)

Now let us look at the case of P -> (Q -> R). Here the unfolding pattern looks almost the same as the case of (P * Q) -> R. But on a closer scrutiny, we find that the labels that index and leaves unfolding at each level are different. What is now unfolding are the inhabitants of (Q -> R) where each level is indexed by the inhabitants of P at each level instead of inhabitants of R exponentiatied by P * Q in the previous case.

An Unfolding Pattern of P -> (Q -> R)

Shows a tree of unfolding the inhabitants of the types P -> (Q -> R) which can be described using S-expressions as:

(+
(* (-> 1 (-> I a))
(+
(* (-> 2 (-> I a))
(+
(-> 3 (-> I a))
(-> 3 (-> I b))
)
)
(* (-> 2 (-> I b))
(+
(-> 3 (-> I a))
(-> 3 (-> I b))
)
)
)
)
(* (-> 1 (-> I b))
(+
(* (-> 2 (-> I a))
(+
(-> 3 (-> I a))
(-> 3 (-> I b))
)
)
(* (-> 2 (-> I b))
(+
(-> 3 (-> I a))
(-> 3 (-> I b))
)
)
)
)
)

At each level, the indices 1, 2, and 3 are extracted out of the common pairs of (-> I a) and (-> I b) to highlight they are repetitive and put around a common container of (+ (-> I a) (-> I b)). Thus there is only a single 1, two 2s, and 4 3s at each unfolding level of the tree.

The study done here might give the impression that the unfolding patterns for both formulations are going to be the same no matter their inhabitants. But this happened here only because Q has a single inhabitant. In a later section, we will see an example where this uncurried / exponent of product version leads to a deeper tree than the shallower version given by the curried / chained exponentials version.

Here are the results of unfolding the tree out completely and obtaining their nodes. Notice how the inhabitants also share the same P -> (Q -> R) chained exponential pattern in the individual components they are made up of.

The normalized results of unfolding the curried function P -> (Q -> R) with 2, 1, and 3 inhabitants respectively.

Shows a grid of the results:

(1 -> (I -> a)) * (2 -> (I -> a)) * (3 -> (I -> a))
(1 -> (I -> a)) * (2 -> (I -> a)) * (3 -> (I -> b))
(1 -> (I -> a)) * (2 -> (I -> b)) * (3 -> (I -> a))
(1 -> (I -> a)) * (2 -> (I -> b)) * (3 -> (I -> b))
(1 -> (I -> b)) * (2 -> (I -> a)) * (3 -> (I -> a))
(1 -> (I -> b)) * (2 -> (I -> a)) * (3 -> (I -> b))
(1 -> (I -> b)) * (2 -> (I -> b)) * (3 -> (I -> a))
(1 -> (I -> b)) * (2 -> (I -> b)) * (3 -> (I -> b))

Now let us put the unfolding forms of the expressions and their normalized results together. On the left, we have the curried version with two chained exponentials and on the right, the uncurried version which chains product types with an exponential. Notice that we are reifying possible computations tracked by types that are processes, as data structures to do this counting.

The juxtaposition of Unfolding Pattern and Normalized Results of P -> (Q -> R) and (P * Q) -> R

Unfolding pattern of P -> (Q -> R) shows {I, a}, and {I, b} unfolding as two trees with either of them as the root node and then doubling and finally quadrupling resulting in 2 of them at the first stage, 4 of them in the second stage, and 8 of them in the third and last stage. Each of these stages are tracked by 1, 2, and 3 respectively.

The Normalized Results of this unfolding pattern is shown as described in: https://patternatlas.com/v0/curry-howard-de-bruijn/img/uncurried-version-normalized-results-intro.svg

Unfolding pattern of (P * Q) -> R shows {a, b} unfolding as two trees with either of them as the root node and then doubling and finally quadrupling resulting in 2, 4, and 8 of them per stage. Each of these stages are tracked by 1 * I, 2 * I, 3 *I respectively.

The Normalized Results of this unfolding pattern is shown as described in: https://patternatlas.com/v0/curry-howard-de-bruijn/img/curried-version-normalized-results-intro.svg

The key idea which I want to highlight here is that while both of the expressions have a distinct unfolding pattern as a structure, on normalizing them, that is evaluating them to terms which can’t be simplified further, we end up with results which are identical.

There is a small wrinkle here that we have a difference in operators here, but at the level of normalized elements, we have the identity that 1¹ being the same as 1 × 1. That is exponentiating something by 1 is identical to multiplying it 1 time. This can be leveraged to turn them from one form to another. I will talk about these algebraic transforms in a later section. At this point the broad strokes details are only what I intend to be communicated.

The global structures we use to organize these two are very different while their local inhabitants turn out to be semantically identical once we unfold them out fully. This isomorphism between the end results is tracked by the logical equivalence between the types. The equivalence established in the logical realm translates as equality of cardinality of the inhabitant data structures. It enables us to see that the way in which we organize programs logically influence not just the way we think about them but also the way in which a machine executes it over space and time. This is a very elementary example and the underlying theory is really much more rich and deep and has more twists, turns, braids, and knots to it.

If this is your first time getting exposed to the ideas, don’t worry. Only a basic gist of the two ways of organizing programs at the type level resulting in two different unfolding patterns is the only thing you need to grasp at this point in the essay. Notice the curious fact that even when the unfolding patterns are structurally the same, their contents: the indices and inhabitants at the leaves of the tree, at each level are very different. I feel this is a significant idea which can shine light on decisions we take towards architecting databases and design algorithms. The broader implications this has for programming in general and dealing with data in systems is something I want to explore in the future. But concentrating our focus on a definite problem, we will flesh out the implications it has for mundane programming tasks using a concrete example in the upcoming sections.

A Brief History of the Correspondence

This connection to computation and logical systems was recognized in a computational context by Nicolaas G. de Bruijn when he designed AUTOMATH with propositions as “categories” of lambda calculus terms and proofs as terms. It is interesting to see him note that he borrowed the idea of exponentials from Heyting’s work. This is a nice paper by him reflecting on his work on AUTOMATH.

Cover page of Reflections on Automath (1994)

Nicolaas G. de Bruijn

But the original identification of this correspondence, though there were no computer programs around at the time, was done by Haskell Curry when he was studying logical systems using his combinatory calculus. It was recorded in his paper called “Functionality in Combinatory Logic” from 1934. But this effort went un(der)developed for about 30 years and till William Howard took notice of this and extended it!

Cover of Functionality in Combinatory Logic

Haskell B. Curry

William Howard influenced by Haskell Curry’s work would independently state this correspondence later in his 1969 manuscript. This paper was widely circulated and published in 1980. By this time, this idea got osmosized into the research culture. It has been researched and developed ever since then!

Cover of The Formulae-as-Types Notion of Construction (1969)

William A. Howard

That was a very quick and brief runthrough of the four way correspondence which I think shines light on why this is such an interesting idea! But it must have been too fast paced unless you knew these ideas beforehand. Let us unpack these at a slower pace considering each side of the original equation one at a time.

❖ ❖ ❖

Stay tuned for the next section in this article coming up really really soon.