Showing posts with label memoization. Show all posts
Showing posts with label memoization. Show all posts

Monday, October 17, 2011

Parsing Expression Grammars part 3 (final)

So, last week, we built a C99 parser. A slow C99 parser. At the end we added one line of code that sped things up a bunch. Let’s look at that.

Why recursive descent parsing is slow

Right now our PEG parser is what’s known as a recursive descent parser – it recursively descends down the parse tree looking for matches. Unfortunately, recursive descent is very slow for two reasons:
  1. One grammar rule may try to parse a given span of input the same way multiple times.
  2. Multiple grammar rules may try to parse a given span of input the same way multiple times.
An example of the former is in the selection_statement production (Typedefs omitted for clarity):
selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement, kw("else"), statement) -> [];
    (kw("if"), p("("), expression, p(")"), statement) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).
The string “if (x > 0) x--; return;” will be parsed once by the first branch, which fails when it finds “return” instead of “else”, and will then be reparsed by the second branch, which will succeed. This sort of prefix-repetition can be eliminated by factoring out the repeated portion:
selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement,
     ((kw("else"), statement) -> []; [])) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).
Of course, this means that our code no longer is structurally identical to the grammar we are parsing, but this may be a welcome tradeoff.

An example of the latter, which is more pernicious, is given by, well, almost every production. Why? Consider statement:
statement -->
    labeled_statement -> [];
    compound_statement -> [];
    expression_statement -> [];
    selection_statement -> []; 
    iteration_statement -> []; 
    jump_statement.
Pretty innocuous, right? Take a look at each of the productions it invokes. Almost all of them start with a call to kw, which in turn calls token. In fact, all of them start with a call, directly or indirectly, to token. In fact, this is true of every production in the parser. token is not the only culprit but it is the most pervasive.

This problem can’t be solved through a simple restructuring of the grammar. For this we need a more powerful tool: memoization.

Memoization

Memoization is one of my favorite features. Simply put, memozation of a procedure means maintaining a table of its past input and output values, and looking up output values in this table when the procedure is called in the future with the same input values. Of course, this only works if the procedure has no desirable side effects; fortunately this is true of most Mercury code. Mercury provides built-in support for memoization via the memo pragma. Basic usage is so:
:- pragma memo(Name/Arity).
By default, Mercury memoizes inputs by value. This will catch all repeated calls to a procedure, but can be slow for large structures. We can specify that we would like Mercury to memoize inputs by address instead:
:- pragma memo(Name/Arity, [fast_loose]).
Memoizing by address is much faster but may miss calls with identical values at different addresses. However, because our primary data structure is a list of characters which is very large (meaning memoize-by-value will be slow) and which we are only destructing (meaning identical sublists of this list must be physically the same), we will memoize by address.

Profiling

Now, although we could memoize every procedure in our lexer and parser, memoization does have overhead, and so we should not use it on procedures which are not called often or with repeated inputs. We should limit our scope to grammar rules which are invoked by many different rules, or multiple times by one rule, since these are likely to be invoking the rule repeatedly on the same input.

To find such candidates for memoization, we can use runtime code profiling. To generate profiling data, we can simply compile with the -p switch and then run our program with our test from last week (minus the memoization of token we added at the end):
$ mmc -p --infer-all --make c_parser_direct
$ echo 'int main(void) {}' | ./c_parser_direct
Let that churn for a few seconds, and then kill it with Ctrl+C. Now run mprof to find out what’s been taking so long:
   %  cumulative    self              self    total                                                            
 time    seconds  seconds    calls  ms/call  ms/call name                                                      
 68.5       8.70     8.70 17602295     0.00     0.00 <function `string.to_char_list/1' mode 0> [1]
  7.1       9.60     0.90 17602483     0.00     0.00 <predicate `builtin.unify/2' mode 0> [2]     
  6.3      10.40     0.80   202300     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [3]
  5.9      11.15     0.75   202388     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [4]   
  4.7      11.75     0.60 10114349     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [5]
  4.3      12.30     0.55  7487945     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [6]
Over 95% of that run was spent just checking for keywords and punctuators! In particular, the biggest culprit, string.to_char_list, is highly suspect. To find out who’s calling it, we can run mprof -c [note: sometimes I have had to run mprof -cd; I can’t reproduce this now so I’m not sure why] and search for the callers of string.to_char_list:
                3.70        3.70 7487945/17602295    <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                5.00        5.00 10114350/17602295    <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [22]
[21]    68.5    8.70        8.70 17602295         <function `string.to_char_list/1' mode 0> [21]
The [21] in the left-most column indicates that this section is dedicated to procedure #21, string.to_char_list, which was called 17,602,295 times. The two lines above this line indicate that both lexer.keyword_rec and lexer.punctuator_rec call string.to_char_list, 7,487,945 and 10,114,350 times respectively. Of course; we use string.to_char_list to convert our keywords and punctuators to character lists for these two procedures to match against the input data. But clearly we don't have 7,487,945 keywords or 10,114,350 punctuators, so string.to_char_list must be getting called with the same arguments repeatedly. Let’s fix that:
:- func to_char_list_memo(string) = list(char).
:- pragma memo(to_char_list_memo/1, [fast_loose]).
to_char_list_memo(S) = to_char_list(S).
Unfortunately we can’t instruct Mercury to memoize procedures in other modules, so we must define to_char_list_memo as above. We can replace the two calls to it in lexer.keyword_rec and lexer.punctuator_rec with the memoized version. Compiling and reprofiling tells us:
   %  cumulative    self              self    total                                                             
 time    seconds  seconds    calls  ms/call  ms/call name                                                       
 45.2       1.65     1.65 10400248     0.00     0.00 <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
 12.3       2.10     0.45 10400437     0.00     0.00 <predicate `builtin.unify/2' mode 0> [29]                
 12.3       2.55     0.45  5975244     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [27]
 12.3       3.00     0.45   119606     0.00     0.01 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]   
  8.2       3.30     0.30  4425004     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]
  4.1       3.45     0.15   119517     0.00     0.02 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]

Tracking down token

We’ve reduced the percentage of total runtime consumed by to_char_list, but it’s still getting called an awful lot. We know keyword_rec and punctuator_rec don’t call it any more often than they need to, so we need to find out who’s calling them. Let’s follow keyword_rec:
                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
[24]    45.0    0.45        1.64  119606         <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                0.70        0.70 4425004/10400248    <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
                0.30        0.49 4425004/4425004     <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]
keyword is the only caller, no surprise there. Let’s go up one more level:
                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.token/3' mode 0> [1]       
[23]    45.0    0.00        1.64  119606         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
                0.00        0.00      57/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]        
                0.00        0.00      57/836707      <predicate `char.is_digit/1' mode 0> [59]
Again, no surprise – keyword is called only by token.
                0.00        0.21    7009/119606      <predicate `c_parser_aim.cast_expression/3' mode 0> [8]
                0.00        0.00       2/119606      <predicate `c_parser_aim.compound_statement/3' mode 0>  <cycle 3> [55]
                0.00        0.00     117/119606      <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 4> [35]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [52]             
                0.00        0.00       9/119606      <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [50]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [47]
                0.00        0.00      36/119606      <predicate `c_parser_aim.enum_specifier/3' mode 0> [42]                        
                0.01        0.43   14042/119606      <predicate `c_parser_aim.kw/3' mode 0> [30]            
                0.00        0.00       3/119606      <predicate `c_parser_aim.labeled_statement/3' mode 0> [53]
                0.00        0.00       8/119606      <predicate `c_parser_aim.p/3' mode 0> [51]                
                0.00        0.00       2/119606      <predicate `c_parser_aim.parameter_list/3' mode 0>  <cycle 2> [54]
                0.00        0.00       1/119606      <predicate `c_parser_aim.parameter_type_list/3' mode 0>  <cycle 2> [56]
                0.00        0.00      26/119606      <predicate `c_parser_aim.pointer/2' mode 0> [45]                             
                0.01        0.43   14019/119606      <predicate `c_parser_aim.postfix_expression/3' mode 0> [25]
                0.01        0.86   28036/119606      <predicate `c_parser_aim.primary_expression/3' mode 0> [26]
                0.00        0.00      48/119606      <predicate `c_parser_aim.struct_or_union/2' mode 0> [41]   
                0.00        0.00      36/119606      <predicate `c_parser_aim.type_qualifier/2' mode 0> [43] 
                0.00        0.00     120/119606      <predicate `c_parser_aim.type_specifier/3' mode 0> [37]
                0.00        0.00      12/119606      <predicate `c_parser_aim.typedef_name/3' mode 0> [48]  
                0.02        1.71   56072/119606      <predicate `c_parser_aim.unary_expression/3' mode 0> [20]
[1]    100.0    0.05        3.65  119606         <predicate `c_parser_aim.lexer.token/3' mode 0> [1]
                0.00        0.00  119517/119517      <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [66]
                0.00        0.00      31/31          <function `string.from_char_list/1' mode 0> [60]             
                0.00        0.10  119517/119517      <predicate `c_parser_aim.lexer.constant/2' mode 0> [32]
                0.00        0.00  119548/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.00        0.00      31/31          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [71]     
                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]       
                0.15        1.81  119517/119517      <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]
                0.00        0.05  119606/119606      <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [34]
And there’s the culprit. token is called by many other procedures, and as discussed above, almost certainly with repeated inputs. Let’s memoize token.
:- pragma memo(token/3, [fast_loose]).
Now (as we found out last week), parsing our simple test completes in a reasonable amount of time. Let’s give our parser something bigger to chew on.

Factoring out multiple calls from multiplicative_expression

For stress testing, I used a 6kb program from Glen McCluskey’s C99 test suite, tdes_047.c. (It is available in the set of free samples if you wish to try it yourself.) We can use this (or any C99 code) as follows:
$ cpp -P c9_samp/tdes_047.c | ./c_parser_direct
And of course, we are met with no end to the parsing. mprof still tells us that token is responsible for the most CPU usage, but since we’ve already memoized it, let’s work our way up the chain some more. Who is responsible for most of the calls to token?
                0.23        0.23 1358257/95678933    <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
                0.00        0.00    7546/95678933    <predicate `c_parser_aim.and_expression/3' mode 0>  <cycle 2> [24]     
                0.00        0.00    2370/95678933    <predicate `c_parser_aim.assignment_operator/2' mode 0> [29]            
                1.36        1.36 8218754/95678933    <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]
                0.00        0.00       7/95678933    <predicate `c_parser_aim.compound_statement/3' mode 0> [14]              
                0.00        0.00     236/95678933    <predicate `c_parser_aim.conditional_expression/3' mode 0>  <cycle 2> [50]
                0.00        0.00      14/95678933    <predicate `c_parser_aim.declaration/4' mode 0> [5]                             
                0.00        0.00     457/95678933    <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 2> [43]
                0.00        0.00     392/95678933    <predicate `c_parser_aim.designator/3' mode 0> [48]                             
                0.00        0.00      95/95678933    <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [53]
                0.00        0.00     368/95678933    <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [49]
                0.00        0.00     890/95678933    <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [27]
                0.00        0.00     999/95678933    <predicate `c_parser_aim.enum_specifier/3' mode 0> [38]                        
                0.01        0.01   30183/95678933    <predicate `c_parser_aim.equality_expression/3' mode 0>  <cycle 2> [23]
                0.00        0.00    3773/95678933    <predicate `c_parser_aim.exclusive_or_expression/3' mode 0>  <cycle 2> [26]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.identifier_list/2' mode 0> [57]                         
                0.00        0.00    1886/95678933    <predicate `c_parser_aim.inclusive_or_expression/3' mode 0>  <cycle 2> [31]
                0.00        0.00      17/95678933    <predicate `c_parser_aim.init_declarator/4' mode 0> [8]                          
                0.00        0.00       8/95678933    <predicate `c_parser_aim.init_declarator_list/4' mode 0> [7]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.initializer/3' mode 0>  <cycle 2> [56]
                0.00        0.00      48/95678933    <predicate `c_parser_aim.initializer_list/3' mode 0>  <cycle 2> [45]
                0.02        0.02  139104/95678933    <predicate `c_parser_aim.kw/3' mode 0> [22]                               
                0.00        0.00     943/95678933    <predicate `c_parser_aim.logical_and_expression/3' mode 0>  <cycle 2> [39]
                0.00        0.00     472/95678933    <predicate `c_parser_aim.logical_or_expression/3' mode 0>  <cycle 2> [44] 
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                0.00        0.00    1474/95678933    <predicate `c_parser_aim.p/3' mode 0> [34]                                         
                0.00        0.00    1172/95678933    <predicate `c_parser_aim.pointer/2' mode 0> [32]
                2.72        2.72 16437987/95678933    <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [4]
                8.10        8.10 48898650/95678933    <predicate `c_parser_aim.postfix_expression_rec/3' mode 0> [9]             
                2.20        2.20 13264944/95678933    <predicate `c_parser_aim.primary_expression/3' mode 0> [13]   
                0.03        0.03  181101/95678933    <predicate `c_parser_aim.relational_expression/3' mode 0>  <cycle 2> [21]
                0.08        0.08  452752/95678933    <predicate `c_parser_aim.shift_expression/3' mode 0>  <cycle 2> [20]     
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declaration/3' mode 0>  <cycle 2> [54]
                0.00        0.00     136/95678933    <predicate `c_parser_aim.struct_declarator/3' mode 0>  <cycle 2> [52] 
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declarator_list/3' mode 0>  <cycle 2> [55]
                0.00        0.00    1404/95678933    <predicate `c_parser_aim.struct_or_union/2' mode 0> [36]                        
                0.00        0.00     102/95678933    <predicate `c_parser_aim.struct_or_union_specifier/3' mode 0>  <cycle 2> [33]
                0.00        0.00    2334/95678933    <predicate `c_parser_aim.type_qualifier/2' mode 0> [30]                            
                0.00        0.00    4079/95678933    <predicate `c_parser_aim.type_specifier/3' mode 0>  <cycle 2> [25]
                0.00        0.00     333/95678933    <predicate `c_parser_aim.typedef_name/3' mode 0> [51]                   
                0.09        0.09  553352/95678933    <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
[6]     52.1   15.85       15.85 95678933         <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                          
                0.00        0.00      96/96          <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [81]
                0.00        0.00      58/58          <function `string.from_char_list/1' mode 0> [64]             
                0.00        0.00     108/108         <predicate `c_parser_aim.lexer.constant/2' mode 0> [88]
                0.00        0.00     166/412         <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [86]
                0.00        0.00      58/58          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [97]     
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [83]       
                0.00        0.00      96/96          <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [95]
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [82]
It seems like postfix_expression_rec is the winner; it is responsible for over half the calls to token. Tracing it back similarly to how we traced keyword_rec, we find that it is called solely by postfix_expression 8,149,775 times, which is called solely by unary_expression a similar number of times, which is called almost exclusively by cast_expression a similar number of times, which is called almost exclusively by multiplicative_expression, which is called… only 2,054,666 times:
                                 2054666             <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
[17]     7.2    0.70        2.18 2054666         <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                            
                0.47        0.47 6112154/87459316    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [11]
                                 8218660             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]
In other words, multiplicative_expression is responsible for multiplying the number of calls passing through it to token fourfold. Let’s look at multiplicative_expression (again, with Typedefs elided):
multiplicative_expression -->
    (cast_expression, p("*"), multiplicative_expression) -> [];
    (cast_expression, p("/"), multiplicative_expression) -> [];
    (cast_expression, p("%"), multiplicative_expression) -> [];
    cast_expression.
A-ha, a repeated prefix! We know how to fix this:
multiplicative_expression -->
    cast_expression,
    ((p("*"), multiplicative_expression) -> [];
     (p("/"), multiplicative_expression) -> [];
     (p("%"), multiplicative_expression) -> [];
     []).
Or, if we’d prefer to leave the structure of the grammar intact at the cost of memoization overhead, we can memoize cast_expression:
:- pred cast_expression(list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/2, [fast_loose]).
(Note the :- pred declaration is necessary as you cannot memoize undeclared modes.)

Notice that most of the other _expression productions follow the same pattern; all of them will in fact benefit from prefix-factoring or memoization. Let’s do that and move on.

Memoizing multiple calls to unary_expression

Despite all our optimization so far, token (and postfix_expression_rec in turn) still reigns supreme. However, tracing the chain of calls back this time now reveals this:
                                  170474             <predicate `c_parser_aim.assignment_expression/3' mode 0>  <cycle 2> [15]
                                  198268             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [32]      
[18]     7.2    0.10        0.32  368742         <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
                0.03        0.06  171588/338952      <predicate `c_parser_aim.kw/3' mode 0> [30]                           
                0.13        0.13  686352/12057646    <predicate `c_parser_aim.lexer.token/3' mode 0> [7]
                0.03        0.03  686352/11652569    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [17]
                                  368742             <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [9]
unary_expression is called approximately an equal number of times by both assignment_expression and cast_expression. This makes it a perfect candidate for memoization:
:- pred cast_expression(typedefs::in, list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/3, [fast_loose]).

Ad nauseum

And that’s about all there is to it: following this method I found I needed to memoize two more procedures and factor one more, and the parser parsed the test file faster than time could reliably distinguish. Of course, it would behoove us to test on different inputs in order to exercise different grammar productions, but the method is the same.

Summary

To recap:
  1. Use mprof -c to determine which procedure is the largest time hog.
  2. If the procedure is called by primarily one other procedure on a 1:1 basis, or if you have already optimized it, look instead at its most prominent caller (etc.).
  3. If the procedure is called by primarily one other procedure on a greater than 1:1 basis and it is a common prefix of this caller, factor it out.
  4. Otherwise, memoize it.
  5. Recompile and repeat.
This simple method will asymptotically increase the speed of almost any naïve PEG parser, in my experience, from unusable to quite fast, and will avoid the overhead associated with unnecessarily memoizing every grammar production in a parser.

Download c_parser_memo.m

Further reading

The direct encoding of PEGs in Mercury, and the performance implications thereof, were first explored in-depth by Ralph Becket and Zoltan Somogyi in the paper DCGs + Memoing = Packrat Parsing; But is it worth it?; I highly recommend it as further reading for those interested in the subject.

Friday, March 25, 2011

Project Euler problem 2

Project Euler problem 2 asks:

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Again we start with main module boilerplate:

:- module p2.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

In our implementation, we will again use the int module. Because the range of terms to sum is not definite (we do not know how many terms are less than four million without prior analysis), solutions will not be useful to us. (Note that even the solutions.do_while predicate cannot be relied upon, as the order in which terms are generated is not guaranteed, and we have no efficient way of knowing if we’ve found all the terms of interest.)

:- implementation.
:- import_module int.

First let’s define the recursive function fibonacci of one argument to compute the Ith Fibonacci number.

:- func fibonacci(int) = int.
:- pragma memo(fibonacci/1).
fibonacci(I) =
(if I = 0 then 1
else if I = 1 then 2
else fibonacci(I - 1) + fibonacci(I - 2)).

Normally a function so defined would be exponential with I. However, by instructing the Mercury compiler to memoize fibonacci via the memo pragma, we achieve linear performance as if we had used a dynamic programming algorithm. (Note that memoization is not yet available in all compilation grades, including Java.)

To perform the summation, we can define a recursive function which recurs on successive values of I until fibonacci(I) is not less than four million:

sum_fibonaccis(I) =
(if fibonacci(I) @ Fibo < 4000000 then
sum_fibonaccis(I + 1) +
(if even(Fibo) then Fibo else 0)
else 0).

The @ function is used to bind fibonacci(I) to Fibo in an expression context (akin to C’s = operator). The built-in predicate int.even is used to test for evenness.

Those readers familiar with functional programming will likely notice that this function is not tail-recursive; i.e., the result of the recursive call is not returned directly. This will cause unnecessary (though, in this case, benign) stack growth. The typical solution is to rewrite fibonacci with an accumulator argument:

sum_fibonaccis_acc(I, Sum) =
(if fibonacci(I) @ Fibo < 4000000 then
sum_fibonaccis_acc(I + 1, Sum +
(if even(Fibo) then Fibo else 0))
else Sum).

Mercury however is able to automatically perform this transformation, provided we inform it that (Fibo0 + Fibo1 + ...) + FiboN is indeed the same as Fibo0 + (Fibo1 + ... + FiboN) – that is, that addition is associative:

:- promise all [A, B, C, ABC] (A + B) + C = ABC <=> A + (B + C) = ABC.

This optimization is enabled with the --introduce-accumulators option to mmc. The compiler unfortunately can recognize only a narrow set of functions and predicates as candidates for accumulator introduction. In particular, altering our definition of sum_fibonaccis to bind Fibo = fibonacci(I) in a separate body instead of using @, or moving the innermost if to outside the recursive call, causes the compiler not to recognize this function as a candidate for accumulator introduction.

The remainder of our implementation is unremarkable:

p2 = sum_fibonaccis(0).
main --> print(p2), nl.

Compiling the program with mmc --infer-all --introduce-accumulators --make p2 produces a large warning concerning possible performance problems due to the accumulator introduction altering the order of summation. This is obviously not a problem for addition and can be safely ignored; the warning is meant for operations such as linked list concatenation, the performance of which depends on the value of one operand more than that of the other. If desired this warning can be disabled with the --inhibit-accumulator-warnings option.

The resulting program runs in about 100 ms on my dual-core 1.6 GHz machine.

Download p2.m