Monday, March 28, 2011

Optimization Levels

-O optimization levels run from -1 to 6. Which options are enabled in these levels don’t seem to be documented in the User’s Guide or the mmc(1) man page. They are however documented in the compiler source code. -O2 is the default.

Note: Order matters! For example, --introduce-accumulators followed by -O6 will actually disable --introduce-accumulators, because it is disabled in -O6. For this reason you should put -O options first on the command line.


Disables all optimizations which can be disabled.
  • --no-common-data
  • --no-llds-optimize
  • --no-optimize-jumps
  • --no-optimize-labels
  • --no-optimize-peep
  • --no-smart-indexing
  • --no-static-ground-cells


Minimizes compilation time.
  • --excess-assign
  • --optimize-dead-procs
  • --optimize-repeat 1
  • --no-c-optimize
  • --no-emit-c-loops
  • --no-middle-rec
  • --no-optimize-delay-slot
  • --no-optimize-frames
  • --no-optimize-tailcalls
  • --no-use-local-vars


Cheap optimizations with good payoff.
  • --no-common-struct
  • --no-follow-code
  • --no-inline-simple
  • --no-inline-single-use
  • --no-optimize-fulljumps
  • --no-optimize-initializations
  • --no-simple-neg


Optimizations with good payoff.
  • --inline-compound-threshold 10
  • --optimize-dups
  • --optimize-repeat 1
  • --user-guided-type-specialization


Slow optimizations with good payoff.
  • --constraint-propagation
  • --deforestation
  • --local-constraint-propagation
  • --optimize-higher-order
  • --optimize-reassign
  • --optimize-repeat 4
  • --optimize-saved-vars
  • --optimize-unused-args


Slow optimizations.
  • --higher-order-size-limit 30
  • --inline-compound-threshold 20
  • --inline-simple-threshold 8


Very slow optimizations.
  • --delay-constructs
  • --eliminate-local-vars
  • --higher-order-size-limit 40
  • --inline-compound-threshold 100
  • --loop-invariants
  • --optimize-repeat 5


Unreasonably slow optimizations.
  • --everything-in-one-c-function
  • --inline-alloc
  • --use-macro-for-redo-fail


Optimize for code size rather than speed. This can be used in conjunction with one of the other levels, keeping in mind the aforementioned rules of ordering.
  • --optimize-dead-procs
  • --optimize-dups
  • --optimize-proc-dups
  • --optimize-reassign
  • --unneeded-code-copy-limit 1
  • --no-optimize-fulljumps

Additionally, --optimize-space turns off the following options if set:
  • --inline-alloc
  • --loop-invariants
  • --use-macro-for-redo-fail
  • --no-optimize-labels

Optimizations never enabled

These can make code run slower.
  • --checked-nondet-tailcalls
  • --constraint-propagation
  • --introduce-accumulators (buggy with trailed updates)
  • --optimize-constructor-last-call
  • --optimize-duplicate-calls (buggy with array.init)
  • --type-specialization
  • --unneeded-code

Additionally, intermodule optimizations are never enabled by -O because they require special handling by mmc --make.

Friday, March 25, 2011

Project Euler problem 2

Project Euler problem 2 asks:

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Again we start with main module boilerplate:

:- module p2.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

In our implementation, we will again use the int module. Because the range of terms to sum is not definite (we do not know how many terms are less than four million without prior analysis), solutions will not be useful to us. (Note that even the solutions.do_while predicate cannot be relied upon, as the order in which terms are generated is not guaranteed, and we have no efficient way of knowing if we’ve found all the terms of interest.)

:- implementation.
:- import_module int.

First let’s define the recursive function fibonacci of one argument to compute the Ith Fibonacci number.

:- func fibonacci(int) = int.
:- pragma memo(fibonacci/1).
fibonacci(I) =
(if I = 0 then 1
else if I = 1 then 2
else fibonacci(I - 1) + fibonacci(I - 2)).

Normally a function so defined would be exponential with I. However, by instructing the Mercury compiler to memoize fibonacci via the memo pragma, we achieve linear performance as if we had used a dynamic programming algorithm. (Note that memoization is not yet available in all compilation grades, including Java.)

To perform the summation, we can define a recursive function which recurs on successive values of I until fibonacci(I) is not less than four million:

sum_fibonaccis(I) =
(if fibonacci(I) @ Fibo < 4000000 then
sum_fibonaccis(I + 1) +
(if even(Fibo) then Fibo else 0)
else 0).

The @ function is used to bind fibonacci(I) to Fibo in an expression context (akin to C’s = operator). The built-in predicate int.even is used to test for evenness.

Those readers familiar with functional programming will likely notice that this function is not tail-recursive; i.e., the result of the recursive call is not returned directly. This will cause unnecessary (though, in this case, benign) stack growth. The typical solution is to rewrite fibonacci with an accumulator argument:

sum_fibonaccis_acc(I, Sum) =
(if fibonacci(I) @ Fibo < 4000000 then
sum_fibonaccis_acc(I + 1, Sum +
(if even(Fibo) then Fibo else 0))
else Sum).

Mercury however is able to automatically perform this transformation, provided we inform it that (Fibo0 + Fibo1 + ...) + FiboN is indeed the same as Fibo0 + (Fibo1 + ... + FiboN) – that is, that addition is associative:

:- promise all [A, B, C, ABC] (A + B) + C = ABC <=> A + (B + C) = ABC.

This optimization is enabled with the --introduce-accumulators option to mmc. The compiler unfortunately can recognize only a narrow set of functions and predicates as candidates for accumulator introduction. In particular, altering our definition of sum_fibonaccis to bind Fibo = fibonacci(I) in a separate body instead of using @, or moving the innermost if to outside the recursive call, causes the compiler not to recognize this function as a candidate for accumulator introduction.

The remainder of our implementation is unremarkable:

p2 = sum_fibonaccis(0).
main --> print(p2), nl.

Compiling the program with mmc --infer-all --introduce-accumulators --make p2 produces a large warning concerning possible performance problems due to the accumulator introduction altering the order of summation. This is obviously not a problem for addition and can be safely ignored; the warning is meant for operations such as linked list concatenation, the performance of which depends on the value of one operand more than that of the other. If desired this warning can be disabled with the --inhibit-accumulator-warnings option.

The resulting program runs in about 100 ms on my dual-core 1.6 GHz machine.

Download p2.m

Thursday, March 24, 2011

Project Euler problem 1

Project Euler problem 1 asks:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

We start by declaring the module and defining its interface, which is standard boilerplate for the main module of a program:

:- module p1.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

In our implementation, we will utilize the int and solutions modules from the standard library:

:- implementation.
:- import_module int, solutions.

The integers provided by int are guaranteed to be at least 32-bit and so will be large enough for this problem. solutions provides the aggregate function which we will use to sum the results.

First let’s define the numbers we’re interested in by defining a predicate, aptly named multiple_of_3_or_5_below_1000. It is a predicate of one argument, which, when used as an output (::out), produces zero or more results (is nondet).

:- pred multiple_of_3_or_5_below_1000(int::out) is nondet.

Implementing this predicate is easy using the builtin nondet_int_in_range predicate to generate integers, and rem to check that the remainder is zero when divided by 3 or 5:

multiple_of_3_or_5_below_1000(I) :-
nondet_int_in_range(1, 999, I),
(I rem 3 = 0; I rem 5 = 0).

We can then define the problem solution p1 as the sum of the multiples using builtins aggregate and plus:

p1 = aggregate(multiple_of_3_or_5_below_1000, plus, 0).

The type of p1 can be inferred by the compiler as a function of zero arguments returning an int.

Because plus is associative and commutative, we could use the unsorted_aggregate predicate instead to avoid the overhead of sorting the solutions first (since the order doesn’t matter). However, it turns out that the overhead is negligible in this program, so we can stick with the aggregate function for the sake of clarity.

Lastly, print the result. The IO state is threaded using DCG notation.

main --> print(p1), nl.

We can then build the program using mmc --infer-all --make p1. The resulting program runs in about 100 ms on my dual-core 1.6 GHz machine.

Download p1.m


Hello there! This blog will be about exploring the Mercury programming language. Mercury is a declarative logic language like Prolog, but is purely functional and strongly typed like Haskell.

I do not intend this blog to be an introduction to the language unto itself; the Mercury documentation covers that fairly well. I rather intend it to be a companion to these introductory materials, or as a “bird’s eye view” of the capabilities of Mercury for those interested in learning it further.

Of course I will be happy to answer any questions about the language posed in the comments; I have been using Mercury on-and-off for a couple years and can probably field most basic questions.

My first series of posts will cover Mercury solutions to the great Project Euler problems, a collection of numeric problems with often non-trivial solutions.

With that, let’s begin!