Monday, April 4, 2011

Dependent typing: reversing non-empty lists

This is the first post in a series on dependent typing. For those not familiar with dependent typing, it is the ability to describe data using types which say something about the particular value of the data itself – the type of the data depends on the value of the data. For example, the phrase “even integers” could describe a dependent type which is a subtype of integers. Whether a number belongs to that type depends on whether its value is evenly divisible by 2. (Note that any subtype can be viewed as a simple kind of dependent type.)

The obvious use of dependent typing is to describe invariants of your program, for example, “a non-NULL pointer”, “an array of length 5”, or “a non-empty list”. While full-blown dependent typing requires a full-blown theorem prover (such as Coq or ATS/LF), many mainstream languages support basic dependent types. Java can infer whether a value is not null, OCaml can talk about types which are made up of only certain constructors, Mozilla’s new language Rust can attach arbitrary predicates to types, and I would be remiss not to mention Mercury.

Mercury has limited but useful support for dependent typing via its mode system. The mode system appears complicated, so let’s start with a brief overview.

The mode system

Each predicate (or function) in Mercury does something to its arguments. The most common possibilities are to look at an argument and leave it be (this would be an input argument), or to attempt to bind it to a value (this would be an output argument). Another possibility is that it destroys its argument after looking at it (this happens with the IO state). These possible actions are called modes.

Modes are described simply by the expected state of the argument before the call, and the guaranteed state of the argument after the call using the syntax Before >> After. These states (called instantiatednesses or insts for short) may be one of free (the argument is unbound), ground or bound (the argument is bound to a value), clobbered (the argument is destroyed or overwritten), and unique (the argument is bound to a value which is not referenced anywhere else). The latter two are used mainly for IO state and arrays, so we will focus on free and bound.

The modes we are all familiar with, in and out, are defined in the module builtin as ground >> ground and free >> ground, respectively. In other words, an in argument is expected to be bound to a value before the call and is left that way after, and an out argument may be unbound before the call, but will be bound after.

The magic is that we may tell Mercury that we expect or guarantee an argument to be bound only to certain constructors! To do this, we must first define the instantiatedness in which we are interested, say non-empty lists:
:- inst non_empty_list == bound([ground | ground]).
This defines the non_empty_list inst as a variable bound to a list cons cell, the head and tail of which are both bound. We could similarly define the inst of an empty list as:
:- inst empty_list == bound([]).
In other words, an empty_list is a variable which is bound to the empty list constructor.

Note that there is also a shorthand which allows the above two insts to be written similarly to type definitions:
:- inst non_empty_list ---> [ground | ground].
:- inst empty_list ---> [].
We could then define the mode of an input argument which is expected to be a non-empty list as:
:- mode in_non_empty_list == non_empty_list >> non_empty_list.
In other words, an argument with this mode would be expected to be bound to a non-empty list, and would remain bound that way afterward. Similarly:
:- mode out_non_empty_list == free >> non_empty_list.
An argument with this mode may be unbound before the call, but is guaranteed to be bound to a non-empty list after. Mercury actually has the mode-functions in() and out() already defined, so we may write in(non_empty_list) or out(non_empty_list) to describe these same modes.


Note: due to bug 191, the following example won’t trigger the proper error messages from the compiler as of this post. Download and apply this patch against the compiler to fix the issue, or work around the bug by replacing the two out modes with out(list(ground)), which won’t change the meaning but will avoid triggering the bug. UPDATE: parts of the standard library won’t compile without the bug, so don’t go applying that patch just yet.

On a recent thread at Hacker News, a question was brought up about proving that [X | Xs] = reverse([Y | Ys]) was total (or in Mercury terms, det). Let’s see if we can do this in Mercury using our new tools. First let’s define reverse2 (named so not to conflict with built-in list.reverse) using a recursive auxiliary function with an accumulator:
:- import_module list.

:- func reverse2(list(A)) = list(A).
:- mode reverse2(in) = out.
reverse2(A) = reverse2_aux(A, []).

:- func reverse2_aux(list(A), list(A)) = list(A).
:- mode reverse2_aux(in, in) = out.
reverse2_aux([], Acc) = Acc.
reverse2_aux([A | As], Acc) = reverse2_aux(As, [A | Acc]).
(I have included the mode declarations as we will be adding to them later.) Testing this function in our main predicate is simple:
main(!IO) :-
    Y = 1, Ys = [2,3,4,5],  
    [X | Xs] = reverse2([Y | Ys]),
    print({X, Xs}, !IO), nl(!IO).
A sharp eye will spot the problem before we even compile: the deconstruction [X | Xs] = can fail, which would cause main to be semidet instead of det (that is, a partial function). The compiler spots this as well:
In `main'(di, uo):
  error: determinism declaration not satisfied.
  Declared `det', inferred `semidet'.
  Unification with `list.[X | Xs]' can fail.
Of course this isn’t true. What the compiler would love to see is a mode declaration stating that if reverse2 is passed a non-empty list, then it will return a non-empty list. Let’s write that:
:- mode reverse2(in(non_empty_list)) = out(non_empty_list).
Now we get a different error:
In clause for `reverse2(in((list.non_empty_list))) =
  mode error: argument 2 did not get sufficiently instantiated.
  Final instantiatedness of `HeadVar__2' was `ground',
  expected final instantiatedness was `bound(list.'[|]'(ground,
(Note: if you didn’t/can’t apply the above compiler patch, as a workaround, replace out in the original mode declarations with out(list(ground)).)

Here, HeadVar__2 refers to the function result (what would be the second variable in the head were reverse2 a predicate). The compiler expected the function result to be non-empty (bound to list.[|]) but it was bound to an unknown value of type list. Of course this is because reverse2_aux doesn't guarantee a non-empty result on non-empty input, so let’s add a declaration stating so as well:
:- mode reverse2_aux(in(non_empty_list), in) = out(non_empty_list).
(Note that we don’t say anything about the accumulator in this declaration.) Unfortunately recompiling gives us a similar error, this time complaining about the second (the recursive) clause of reverse2_aux. Of course – if our initial list had only one item, the recursive call would pass reverse2_aux an empty list, triggering the first clause, and we haven’t made (and can’t make) any special guarantees about that case. Let's take a closer look at that clause:
reverse2_aux([], Acc) = Acc.
The only way that this clause will return a non-empty list is if the accumulator, Acc, is non-empty. Fortunately, it is called recursively with a non-empty accumulator (since elements are prepended on every recurrence), so this should be enough to convince the compiler. Let’s add a mode declaration stating this fact:
:- mode reverse2_aux(in, in(non_empty_list)) = out(non_empty_list).
Success! Compilation is successful, meaning main has been proved to be det (a total function), and running the program gives the expected output:
{5, [4, 3, 2, 1]}
Using Mercury’s mode system, we were able to prove that reversing a non-empty list results in a non-empty list, by adding descriptive mode declarations in strategic locations.

As an exercise, add an inst describing a list of at least two elements and add modes to reverse2 and reverse2_aux which make guarantees about lists of at least two elements.

Download reverse.m


  1. Very interesting post, thank you.

    Strange, but I can compile code below (note commented lines), but according to the post I expected I'll get compilation error. I'm a bit confused.

    :- module reverse.

    :- interface.
    :- import_module io.
    :- pred main(io::di, io::uo) is det.

    :- implementation.

    :- import_module list.

    :- func reverse2(list(int)) = list(int).
    %:- mode reverse2(in) = out.
    :- mode reverse2(in(non_empty_list)) = out(non_empty_list).
    reverse2(A) = reverse2_aux(A, []).

    :- func reverse2_aux(list(int), list(int)) = list(int).
    :- mode reverse2_aux(in, in) = out.
    %:- mode reverse2_aux(in(non_empty_list), in) = out(non_empty_list).
    %:- mode reverse2_aux(in, in(non_empty_list)) = out(non_empty_list).
    reverse2_aux([], Acc) = Acc.
    reverse2_aux([A | As], Acc) = reverse2_aux(As, [A | Acc]).

    main(!IO) :-
    Y = 1, Ys = [2,3,4,5],
    [X | Xs] = reverse2([Y | Ys]),
    print({X, Xs}, !IO), nl(!IO).

    I'm using 10.04.2 version of mercury (from What mercury version are you using?

    D:\stuff\test\mercury\deptypes>mmc.bat --make -s hlc.gc reverse
    Making Mercury\int3s\reverse.int3
    Making Mercury\ints\
    Making Mercury\cs\reverse.c
    Making Mercury\os\reverse.o
    Making reverse.exe

    {5, [4, 3, 2, 1]}

    Mercury Compiler, version 10.04.2, configured for i686-pc-mingw32
    Copyright (C) 1993-2010 The University of Melbourne
    Usage: mmc []
    Use `mmc --help' for more information.

  2. Hi xonix, glad you found the post interesting.

    You must have read it before I added the note on the issue you mentioned; it’s a known bug for which there’s a patch available, but unfortunately this won’t help you on winmercury until they release a new version.

    However the bug only affects inst-checking a predicate call when the output is ground but should be something more specific (e.g. non_empty_list). A workaround is to replace out with out(list(ground)) (shorthand for out(bound([]; [ground|ground]))), which means the same thing in the above example but avoids the bug.

  3. None of these are actually dependent types, except Rust's typestate predicates. Your definition of dependent type is incorrect: to have a dependent type, you have to have a type whose parameter is a value. Types that are statically constrained to be non-null, or a cons rather than a nil, or whatever, may be very useful, but they are not dependent types.

  4. Hi Robin, thanks for the feedback. I'm not sure why you don't consider such types not to be dependent however. To specify such a type in e.g. Coq (a dependently-typed purely functional language) you need to use dependent types:

    Inductive list A :=
    | nil: list A
    | cons: A -> list A -> list A.

    Definition non_empty_list A :=
    { l: list A | l <> nil A }.

    {|} is a type constructor of two parameters, a type and a proposition over elements of that type. The proposition in this case is built from an element of the list type; hence this is a dependent type (albeit an indirect one).

    I would argue by this logic that subtypes in general are necessarily dependent types. Am I correct in assuming that subtypes do not fall under your definition of dependent types?