Dependable Types 2: Correctness by Construction27 Jan 2018
Part two already, is it? Well, I suppose it’s about time to write some code! Last time, we covered the constructions of the lambda calculus, as well as De Bruijn indices. Today, we’re going to cover basically the same thing, but encoding everything we learnt in Idris! If you’re not familiar with Idris or dependently-typed programming, things are going to start getting… weird.
As you may remember, the ultimate goal is to represent the simply-typed lambda calculus. With that in mind, we should probably define some simple types:
data ProgramType = PFunction ProgramType ProgramType | PInt
So, for now, we only have two types: functions (from one type to another), and integers. If you want to add more simple types, it’s as easy as you think!
Now that’s sorted, let’s have a go at building expressions with an abstract data type.
First try: ADTs
data Expression = Variable Int | Abstraction ProgramType Expression | Application Expression Expression
Just as the last article did, we’re saying there are essentially only three types of expressions:
Variables, which we can reference by De Bruijn index (if zero-indexed, this is essentially how many variables have been introduced since the one I care about?)
Abstractions, which introduce a new variable with a given type that can be referenced within the given body.
Applications, which take the value on the right, and apply it to the function on the left.
This is a perfectly sensible way to represent the calculus, and feel free to write it an interpreter - we’ll certainly be writing one later in the series! I’d imagine you’ll have a type signature like this:
interpret : Expression -> Maybe Expression
So, that’s… fine… but that
Maybe is kind of ugly, right? Why does it
need to be there? If you’ve had some experience with Haskell-like languages
before, you’ve probably already had a suspicion or two:
How do we know the index in every
Variableis valid? What if it’s higher than the outermost variable’s index? What if it’s negative?!
How do we know the left expression in
Applicationis a function? How do we know that its input type matches the right expression’s type?
The more you stare at it, the more holes you can see, and that
more and more inevitable. Today, friends, we’ll ask whether we could do…
Second try: Generalised ADTs.
Here’s where we officially break from Elm and PureScript. Still not from
Haskell, providing you have the
DataKinds extensions enabled.
Within Idris, we can have greater control of our data type using a clever
notion called GADTs. We don’t have enough time in this post to explain them
in depth, so let’s just say that they give us more control over the
construction of our type’s values. If that means nothing to you, don’t
worry! Here’s an example of the above ADT rewritten as a GADT:
data Expression : Type where Variable : Int -> Expression Abstraction : ProgramType -> Expression -> Expression Application : Expression -> Expression
Great. So far, we’ve just got a more long-winded way of doing the same thing. What’s the big deal? Well, let’s have a go at solving that type mismatch issue: Let’s index our type by a ProgramType.
data Expression : ProgramType -> Type where Abstraction : (paramType : ProgramType) -> Expression bodyType -> Expression (PFunction paramType bodyType) Application : Expression (PFunction input output) -> Expression input -> Expression output Variable : ... -- More on this later...
Here’s where things get truly magical. In the first line, we say that
Expression is a type constructor that takes a
ProgramType (just like
Array is a type constructor that takes a
String, and so on). This means, whenever we mention an
Expression, we also
mention its type - now, we can be sure that
Application’s components have
the right types!
This is going to look really weird to Haskell-like language users.
ProgramType is the data type we defined earlier, and now we’re using it
inside a type! This is what makes Idris, and dependent types, special: we use the same
language to talk about types and values. We’ll see later that we can
even pull things out of a type to use as values… and vice versa.
We can pat ourselves on the back: we have solved the type mismatch issue.
However, you’ll notice I’ve tiptoed around
Variable. Our issue was that we
would need to bounds check the
Int; how are we going to fix that?
A small digression:
Idris has a lovely little type called
Elem that is written like this:
data Elem : a -> List a -> Type where Here : Elem x (x :: xs) There : Elem x xs -> Elem x (y :: xs)
Let’s break it down.
Elem is a type indexed by two things: a value of
a, and a list of values of type
Here says, the first
thing in the list is the value (
:: is how we write
cons in Idris:
xs is a list where
x is the item at the front, and
xs is the rest of the
There says, it isn’t the first thing, but it is in the list.
Here are a few examples:
valid : Elem 1 [1, 2, 3] valid = Here alsoValid : Elem "az" ["oo", "ar", "az"] alsoValid = There (There Here) invalid : Elem 0 [1, 2, 3] -- Type error! alsoInvalid : Elem 0 [1, 0] alsoInvalid = Here -- Should be There Here!
What we have is a type-level way of proving that something exists within a list. All well and good, but how does this help us?
Third time lucky: doubly-indexed GADTs
Here’s where we solve all our problems. Firstly, let’s talk about
variables. Every time we use
Abstraction, we introduce a new variable
with a given type into the “context” of our expression. How do you suppose we
can keep track of the list of variables and their types, though?
Context : Type Context = List ProgramType
… Well, that was straightforward. Really sorry if that was a little underwhelming. Because we’re just describing expressions, we don’t need to store values at all - just knowing the types of the variables that are in context (think of this as “scope”) is enough to know whether our expression is valid!
As a little aside, this is one of the things I found a bit weird about Idris: because types and values use the same language, we can declare type aliases just as we declare values. We even give them type signatures!
Here comes the big reveal. This is the bit that blew my mind (which, I can tell you, was very distressing and inconvenient for poor Liam, who had just written it). As well as indexing our expressions by their type, we can also index them by the context they exist in:
data Expression : (ctx : Context) -> (ptype : ProgramType) -> Type where Abstraction : (param : ProgramType) -> (body : Expression (param :: ctx) ptype) -> Expression ctx (PFunction param ptype) Application : (func : Expression ctx (PFunction i o)) -> (arg : Expression ctx i) -> Expression ctx o Variable -- One more minute...
For the benefit of mobile users, these names have been abbreviated to stop line wrapping. If you would prefer a more verbose format, the code for this post uses long-hand names.
Wow! This is where we start to see the power of Idris. We now start by
Expression is a type indexed by a
Context and a
but we also give that
Context a name -
ctx - so we can use it later!
Abstraction is made up of some
ProgramType to represent the type of the
input, and an
Expression of some type to represent the output. What’s new
here, however, is that
body’s context must be the same as the output, but
with a new variable introduced which matches the given parameter type!
We are saying, “within an
Abstraction body, there is one more variable
available: the parameter”. Because the output type doesn’t contain that
extra variable, we are completely encapsulating it within the body expression.
We are saying that in the type. I can’t tell you how excited this makes
Application is actually… pretty much the same as it was before. All our
Context is saying here is that they must exist within the same context,
which is… well, what you’d expect. The point here is that we’re still
carrying the context in the type so we can use it for
Variable! Speaking of
Variable… where is it?
Variable : (ref : Elem ptype ctx) -> Expression ctx ptype
This, for me, is the most beautiful part of all. Now, instead of taking a
number to represent a De Bruijn index, we take an
Elem. We can see that
this maps quite happily:
… Well, you get the picture. The point is that, in order to construct a
Variable expression, we have to prove that the variable is in the
good : Expression  (PFunction PInt PInt) good = Abstraction PInt (Variable Here)
This is the identity function for
PInt: create a function abstraction
PInt-type parameter, and then return the last-introduced variable!
What if we tried to write this:
bad : Expression  (PFunction PInt PInt) bad = Abstraction PInt (Variable (There Here))
Now, we’re saying, “Get me the last-but-one-introduced variable”, but our context is empty - this variable doesn’t exist! Just when we think all hope is lost, up pops a compiler error:
When checking right hand side of second with expected type Expression  (PFunction PInt PInt) When checking argument later to constructor Data.List.There: Type mismatch between Elem x (x :: xs) (Type of Here) and Elem PInt  (Expected type) Specifically: Type mismatch between PInt :: xs and 
This is an actual compiler error telling us exactly that: the context was
empty, but we’ve tried to use a proof that
PInt is in there! Because Idris
wants to see
Elem PInt (x :: xs), but actually sees
Elem PInt , it
knows that there has been a problem. We are now in a situation where we
cannot write invalid expressions if we want our code to compile.
Because we’ve indexed our type by both the program type and context of
the expression, and used a GADT to constrain them, we know that they will
always be what we expect! We can now write what we’ve always wanted to write:
eval : Expression -> Expression
… but we’ll save that for next time!
This has been… a lot of information. A lot of information very
quickly, but hopefully not too quickly. In any case, I’m sorry if this one
takes a couple readthroughs. Idris is a very different beast to Elm,
PureScript, and even Haskell: the ways that you can interact between types and
values are strange and unfamiliar, but we’re starting to see why this
newfound power is helpful. With the aid of indexed types and GADTs, we can
produce types that simply don’t allow for invalid values. We have, with
these features, made illegal states unrepresentable. This is also what we
call correctness by construction: if I can build a value of type
Expression, it must be valid.
If I’ve rushed through anything, or something seems suspicious, don’t hesitate
to send me a tweet or leave an issue on
repository! As always,
there is code for this article, all of
which can be found within
ProgramType.idr, and I encourage you
to play around with it!
Otherwise, have fun, and I’ll see you next time!
Take care ♥