Adventures in Mercury

Checkers AI, part 2

2011-12-05T12:15:00.000-08:00

Today I’d like to wrap up the game logic of our checkers-playing AI. Last time we defined the players, pieces, and the board, and operations on them. Today let’s write the game-playing code.

`moves` module

The moves module simply links the position module to the board module, moving pieces around a board in accordance with the patterns defined in position.

Just at position defines two movement patterns (move and jump), we will define two predicates to update a board based on these patterns: do_move and do_jump.

:- pred do_move(color, position, position, board, board).
:- mode do_move(in, in, in, in, out) is semidet.
:- mode do_move(in, in, out, in, out) is nondet.
:- mode do_move(in, out, out, in, out) is nondet.

:- pred do_jump(color, position, position, board, board).
:- mode do_jump(in, in, in, in, out) is semidet.
:- mode do_jump(in, in, out, in, out) is nondet.
:- mode do_jump(in, out, out, in, out) is nondet.

The three modes defined for each predicate allow us to check if a given move or jump is valid for the given player, to enumerate the moves or jumps available to a piece at a given position, and to enumerate all moves and jumps available on the board, respectively. Implementation is straightforward:

:- import_module piece.

do_move(Color, From, To, !Board) :-
        move(Piece, From, To),  
        move_piece(From, To, Piece, !Board),
        color(Piece) = Color.

do_jump(Color, From, To, !Board) :-
        jump(Piece, From, Jumped, To),
        move_piece(From, To, Piece, !Board),
        color(Piece) = Color,
        remove_piece(Jumped, Other, !Board),
        opponent(color(Piece), color(Other)).

Since in checkers, a piece must jump if it can jump, we will later find it useful to check whether a piece can jump from a given position without the overhead of actually updating the board. Checking whether a piece can move will also be useful for determining the end of game, so let’s define both can_move and can_jump:

:- pred can_move(color, position, board).
:- mode can_move(in, in, in) is semidet.
:- mode can_move(in, out, in) is nondet.

can_move(Color, From, Board) :-
        move(Piece, From, To),
        piece_at(Board, From, Piece),
        no_piece_at(Board, To),
        color(Piece) = Color.

:- pred can_jump(color, position, board).
:- mode can_jump(in, in, in) is semidet.
:- mode can_jump(in, out, in) is nondet.

can_jump(Color, From, Board) :-
        jump(Piece, From, Jumped, To),
        piece_at(Board, From, Piece),
        no_piece_at(Board, To),  
        color(Piece) = Color,
        piece_at(Board, Jumped, Other),
        opponent(color(Piece), color(Other)).

Note that we could simply have defined can_move(Color, From, Board) :- do_move(Color, From, _, Board, _). (and similarly for can_jump, but, since Mercury neither is lazy nor tracks unused arguments, this would entail the unnecessary computational overhead of updating the board.

`game` module

The game module will house the state of the game, and logic for generating possible successor states based on the rules of the game. The state of the game is given by whose turn it is, whether they are in the middle of a jump or not (in checkers, after jumping a piece, you may immediately jump another with the same piece), the history of moves (to check for draws) and the state of the board:

:- import_module board, color, position.
:- type game --->
    game(turn::color,
         jumping_state::jumping,
         history::history,      
         board::board).
:- type jumping_state ---> not_jumping; jumping(position).

In history, we need only remember past board states for each player. (This is because, as we’ll see later, we don’t need to track history of jumps.)

:- import_module set.
:- type history == set({color, board}).

Initialization of the game

Initializing the game state is easy. Red goes first and is not in the middle of a jump:

:- func init(int, int) = game.
init(Width, Height) = game(red, not_jumping, init, init(Width, Height)).

Checkers is played on an 8×8 board by default, so let’s provide initializers for this case as well:

:- func init = game.
init = init(8, 8).

:- pred init(game::out) is det.
init(init).

(The init predicate is redundant but some programmers prefer that style over zero-argument functions so we’ll provide it.)

Rules of the game

The change from one game state to the next is uniquely identified by the starting and ending locations of some piece on the board. We can declare a predicate which updates the game state based on this information, or generates possible new game states, as follows:

:- pred step(position, position, game, game).
:- mode step(in, in, in, out) is semidet.
:- mode step(in, out, in, out) is nondet.
:- mode step(out, out, in, out) is nondet.

The definition of step itself is slightly hairy, as it must follow the rules of checkers. I’ve broken it into two cases, corresponding to when a player begins their move (jumping_state is not_jumping), and when a player is forced to continue a move because a further jump is available (jumping_state is jumping):

step(From, To, game(Color0, not_jumping, !.History, !.Board) @ Game0,
               game(Color, Jumping, !:History, !:Board)) :-          
    % The game can only proceed if it is not a draw.
    not is_draw(Game0),
    % We must jump if it is possible.
    (can_jump(Color0, _, !.Board) ->
        % Since are removing a piece, history doesn't matter any more.
        init(!:History),
        % Update the board.
        do_jump(Color0, From, To, !Board),
        % If more jumps are possible, remember this fact.
        (can_jump(Color0, To, !.Board) ->
            Color = Color0,              
            Jumping = jumping(To)
        ;
            opponent(Color, Color0),
            Jumping = not_jumping   
        )
    ;    
        % Add the last position to our history.
        insert({Color0, !.Board}, !History),
        % Update the board.
        do_move(Color0, From, To, !Board),
        % It is now our opponent’s turn.
        opponent(Color0, Color),
        Jumping = not_jumping
    ),
    % If our move or jump ended in the king’s row, king it.
    (in_kings_row(!.Board, Color0, To), king_piece(To, !Board) -> true; true).

step(From, To, game(Color0, jumping(Pos), _, !.Board),
               game(Color, Jumping, init, !:Board)) :-
    % History is ignored while we are jumping.
    % We can only continue a jump from where we left off.
    From = Pos,
    % Update the board.
    do_jump(Color0, From, To, !Board),
    % If more jumps are possible, remember this fact.
    (can_jump(Color0, To, !.Board) ->
        Color = Color0,
        Jumping = jumping(To)
    ;
        opponent(Color, Color0),
        Jumping = not_jumping
    ),
    % If our jump ended in the king’s row, king it.
    (in_kings_row(!.Board, Color0, To), king_piece(To, !Board) -> true; true).

(is_draw is defined below.) Phew! That, fortunately, is the single most complex chunk of code in the entire implementation.

Checking the state of the game

It is important to provide predicates to determine whether a game has finished, and if so, who (if anyone) has won. First, we define can_step as a proxy for step which does not actually create a new game state:

:- pred can_step(game::in) is semidet.

can_step(game(Color, not_jumping, _, Board) @ Game) :-
    not is_draw(Game),
    (can_jump(Color, _, Board); can_move(Color, _, Board)).

can_step(game(_, jumping(_), _, _)).

(Again, as with can_move and can_jump, we could have defined this in terms of step, but we would perform unnecessary computation.)

We can then define is_draw to determine whether the current game state is a draw or not (i.e. whether the current state exists in previous history):

:- pred is_draw(game::in) is semidet.

is_draw(game(Color, not_jumping, History, Board)) :-
    member({Color, Board}, History).

Finally, we can determine whether a game is over, and if so, who the winner is (if any) as follows:

:- pred game_over(game::in, maybe(color)::out) is semidet.

game_over(Game, Winner) :-
    if is_draw(Game) then Winner = no
    else
        not can_step(Game),
        Winner = yes(opponent(Game^turn)).

`game_io` module

We’re almost ready to test our game logic with human players. To do this, we’ll first need to declare predicates to print the state of the game, and to allow the human player(s) to choose a move:

:- import_module board, color, game, piece.
:- import_module io, maybe.

:- pred print_color(color::in, io::di, io::uo) is det.
:- pred print_piece(piece::in, io::di, io::uo) is det.
:- pred print_board(board::in, io::di, io::uo) is det.
:- pred print_game(game::in, io::di, io::uo) is det.
:- pred print_winner(maybe(color)::in, io::di, io::uo) is det.

:- pred choose_move(list(game)::in(non_empty_list), result(game)::out,
        io::di, io::uo) is det.

These predicates are all fairly straightforward, so I’ll elide most of them here to save space. Of note is choose_move, which presents each game state in the list in turn to the player, asking if the player wishes to choose that move. If the player chooses none, the sequence is repeated. Note that the mode of this argument is given as in(non_empty_list) to ensure that an empty list of moves is not passed, which would result in an infinite loop. Here is the code, which is mostly implemented by choose_move_aux:

choose_move(Games, Chosen, !IO) :- choose_move_aux(Games, Games, Chosen, !IO).

:- pred choose_move_aux(list(game)::in(non_empty_list), list(game)::in,
    result(game)::out, io::di, io::uo) is det.

choose_move_aux(AllGames, [], Chosen, !IO) :-
    write_string("Repeating...\n\n", !IO),
    choose_move_aux(AllGames, AllGames, Chosen, !IO).

choose_move_aux(AllGames, [Game | Games], Chosen, !IO) :-
    print_board(Game^board, !IO), nl(!IO),
    write_string("Ok? ", !IO),
    read_line_as_string(Res, !IO), nl(!IO),
    (
        Res = ok(String),
        (prefix(String, "y") -> Chosen = ok(Game);
         choose_move_aux(AllGames, Games, Chosen, !IO))
    ;
        Res = eof, Chosen = eof
    ;
        Res = error(Err), Chosen = error(Err)
    ).

`checkers` module

Finally we can wrap everything together with our main module, checkers.

Players

To make our code modular, let’s define a typeclass player:

:- import_module game, io.
:- typeclass player(Player) where [
    pred make_move(Player::in, game::in, game::out, io::di, io::uo)
        is cc_multi
].

This means that a player is any type Player which has defined for it a make_move predicate with the given signature. This allows us to implement various “players”, each with a different choose_move predicate.

By including the IO state and declaring the determinism of make_move as cc_multi, we allow both for players which must interact with IO (e.g., humans), and for players which may choose a move arbitrarily (e.g., search algorithms which find several optimal choices).

Now we can implement various human and AI players which conform to this interface. Since we’re just testing for now, we’ll define a human player like so:

:- import_module game_io, list, require, solutions.

:- type human_player ---> human_player.
:- instance player(human_player) where [
    (choose_move(human_player, Game0, Game1, !IO) :-
        unsorted_solutions(
            pred(Game::out) is nondet :- step(_, _, Game0, Game),
            Moves),                                              
        (
            Moves = [_|_],
            make_move(Moves, Res, !IO),
            (                            
                Res = ok(Game1)
            ;                  
                Res = eof, error("End of file")
            ;                                  
                Res = error(Err), error(error_message(Err))
            )                                              
        ;    
            Moves = [],
            unexpected($module, $pred, "Game is not over but no more moves!")
        )
    )    
].

(The dummy type human_player is needed so we can declare it as an instance of the typeclass player.) unsorted_solutions returns an arbitrarily ordered list of all the possible moves from a given position. We assert that this list should not be empty (as the player should not have been asked to make a move when there are no moves available, as this means that the game is over!), and forward the rest of the work to our previously-defined game_io.choose_move.

We can also define a dummy AI that chooses a move arbitrarily like so:

:- type dummy_player ---> dummy_player.
:- instance player(dummy_player) where [
    (make_move(dummy_player, Game0, Game1, !IO) :-
        unsorted_solutions(                       
            pred(Game::out) is nondet :- step(_, _, Game0, Game),
            Moves),                                              
        Game1 = det_head(Moves))
].

Why typeclasses?

You may rightly ask, “why not simply declare a type player ---> human_player; dummy_player and forget the whole typeclass business?” By using typeclasses, we can provide new player implementations in other modules, rather than stuffing them all into one possibly large module.

Of course, another way of doing this would be using higher-order predicates: we could declare players as a predicate type. The problem with this approach is that we may at some point determine that a player should have more predicates associated with it (such as one defining their name or difficulty), at which point we will need to pass around multiple predicates having messy types and instantiations. Using typeclasses allows us to redefine what a player is more easily.

The main loop

Last but not least, we will define a loop play_game, which takes a game state and two values whose types are instances of the player typeclass to represent the two players. The implementation is straightforward:

:- pred play_game(game::in, Red::in, White::in, io::di, io::uo) is cc_multi
    <= (player(Red), player(White)).

play_game(Game0, Red, White, !IO) :-
    print_game(Game0, !IO), nl(!IO),
    (game_over(Game0, _) -> true;
        (
            Game0^turn = red,
            make_move(Red, Game0, Game1, !IO)
        ;
            Game0^turn = white,
            make_move(White, Game0, Game1, !IO)
        ),
        play_game(Game1, Red, White, !IO)
    ).

In this predicate, make_move will call the make_move predicate associated with the type of Red or White as appropriate.

Last but not least, the main predicate will start a game between two players (in this case, a human_player and a dummy_player):

:- pred main(io::di, io::uo) is cc_multi.
main(!IO) :- play_game(init, human_player, dummy_player, !IO).

Trying it out

Thanks to Mercury’s built-in build system, we can compile the whole shebang with simply mmc --make checkers. Running it lets us play a game versus the dummy AI:

 o o o o
o o o o 
 o o o o
· · · · 
 · · · ·
x x x x 
 x x x x
x x x x 

X's move.

It’s our move first. Let’s cycle through the options:

 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · · x ·
x x x · 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · · x ·
x x · x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · x · ·
x x · x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 · x · ·
x · x x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 x · · ·
x · x x 
 x x x x
x x x x 

Ok? n

 o o o o
o o o o 
 o o o o
· · · · 
 x · · ·
· x x x 
 x x x x
x x x x 

Ok? n

Repeating...

OK, we’ve seen all of them. Let’s choose the first:

 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x 

Ok? y

 o o o o
o o o o 
 o o o o
· · · · 
 · · · x
x x x · 
 x x x x
x x x x

Let’s make sure the computer picks a valid move.

O's move.

 o o o o
o o o o 
 o o o ·
· · · o 
 · · · x
x x x · 
 x x x x
x x x x

Hooray! Of course, now it’s our move again. This could go on for a while. Testing it proves that everything works however!

Conclusion

Today we covered building the game logic in the moves and game module, and I/O and the main loop in the game_io and checkers modules. Next week we’ll finish up by adding an AI based on the minimax algorithm.

Checkers AI, part 1

2011-11-29T18:08:00.000-08:00

Logic languages such as Mercury love search problems. Game artifical intelligence is a great application of searches. AIs for many zero-sum two-player board games, such as checkers and chess, can be implemented as a minimax search; a search which assumes that each player makes a move which maximally benefits himself. In today’s post, I’d like to walk through modeling checkers in Mercury.

Architecture

The overall architecture of our checkers game will consist of the following modules:

color: The two colors (players) in the game and their relation (opponents).
piece: The four kinds of pieces in the game and functions over them (namely, kinging).
position: A representation of the position of a piece, and predicates defining useful relationships between positions (namely, moving and jumping).
board: A representation of the checkers board, predicates about positions on it, predicates about pieces on it, and predicates to modify a board in useful ways (namely, moving, removing, and kinging pieces).
moves: Predicates to update a board via the patterns described in position.
game: An encoding of the state of a game, a predicate to modify it (by making a move according to the rules of checkers), and predicates to query the state of a game (namely, whether the game has ended, and the result if so).
util: Some utility functions for dealing with integers and lists; used primarily by minimax.
minimax: The minimax algorithm and related typeclasses.
heuristic: A declaration of the game as a minimax problem, with associated heuristics.
game_io: Predicates for printing various game structures in a human-readable format, and obtaining user input.
checkers: The main module, allowing configuration of the game and defining the overall UI.

This architecture ensures that modules are loosely coupled and small in size (< 120 loc each, average ~ 60 loc), making modifications and debugging easy. A similar overall architecture should prove useful for other game AI implementations. Let’s go into each module in detail. In today’s post, we’ll focus on color, piece, position, and board.

`color` module

There are two colors in checkers, corresponding to the two players:

:- type color ---> red; white.

They are related as opponents:

:- pred opponent(color, color).
:- mode opponent(in, out) is det.
:- mode opponent(out, in) is det.

opponent(red, white).
opponent(white, red).

We can also define an opponent function for convenience:

:- func opponent(color) = color.
opponent(Color) = Opponent :- opponent(Color, Opponent).

That’s it! Short and to-the-point modules are good. We’ll see that most modules in this game are similarly sized.

`piece` module

Pieces come in two colors, defined in color, and can be either men or kings:

:- import_module color.
:- type kind ---> man; king.
:- type piece ---> piece(color::color, kind::kind).

It is also useful to talk about “kinging” a piece; changing a man into a king:

:- func king(piece) = piece is semidet.
king(piece(Color, man)) = piece(Color, king).

This module is also short and sweet.

`position` module

A position is simply an X, Y tuple:

:- type position ---> position(int, int).

It is useful to talk about pieces changing position by moving and jumping. Let’s declare two predicates, move and jump, which will define the relationship between the starting and ending locations of a piece which moves or jumps, and in the case of a jump, the position which was jumped over.

:- pred move(piece, position, position).
:- pred jump(piece, position, position, position).

It will be most useful to know whether a particular change in position describes a move or jump, and given a starting position, what ending positions would result from a move or jump. We will declare two modes, corresponding to these use cases, for each predicate:

:- mode move(in, in, in) is semidet.
:- mode move(in, in, out) is multi.
:- mode jump(in, in, out, in) is semidet.
:- mode jump(in, in, out, out) is multi.

Because Mercury provides bidirectional arithmetic (i.e. + and - can deduce the values of one operand if the result is known), we can define these predicates with only one clause each:

:- import_module color.
:- import_module int.

move(Piece, position(X, Y), position(X + XD, Y + YD)) :-
    (XD = 1; XD = -1),
    (Piece = piece(red, man), YD = 1;
     Piece = piece(white, man), YD = -1;
     Piece = piece(_, king), YD = 1;
     Piece = piece(_, king), YD = -1).

jump(Piece, position(X, Y), position(X + XD, Y + YD),
     position(X + XD2, Y + YD2)) :-
    (XD = 1, XD2 = 2; XD = -1, XD2 = -2),
    (Piece = piece(red, man), YD = 1, YD2 = 2;
     Piece = piece(white, man), YD = -1, YD2 = -2;
     Piece = piece(_, king), YD = 1, YD2 = 2;
     Piece = piece(_, king), YD = -1, YD2 = -2).

Unfortunately any attempt to compact this code results in mode errors. This is because Mercury’s switch detection code is not very robust in the face of complicated switches, and because Mercury seems not to reorder conjunctions to obtain mode-correctness.

`board` module

First we define the type of a board, which should contain its width, height, a mapping from position to piece, and (for efficiency) a count (using the handy bag module) of how many of each type of piece is on the board:

:- import_module color, piece, position.
:- import_module bag, int, map.

:- type board --->
    board(
        width::int,
        height::int,
        positions::map(position, piece),
        counts::bag(piece)
    ).

Of course we could fix the width and height to 8, but it is easy and interesting to allow for other size boards. Since Mercury automatically generates “getter” functions for width and height, we need only declare them in our interface, even if we choose to hide the type of board:

:- func width(board) = int.
:- func height(board) = int.

Another trivial but important state accessor is count_pieces, which counts the number of a given kind of piece on the board:

:- func count_pieces(board, piece) = int.
count_pieces(Board, Piece) = count_value(Board^counts, Piece).

This function is implemented using the bag we added to the board type so that it may be quickly executed by our heuristic function (defined later).

Initializing the board

Next we define how to initialize a board of a given size:

:- func init(int, int) = board.
init(Width, Height) = !:Board :-
    !:Board = board(Width, Height, init, init),
    fold_up(init_row(piece(red, man)), 0, Height / 2 - 2, !Board),
    fold_up(init_row(piece(white, man)), (Height + 1) / 2 + 1, Height - 1, !Board).

The int.fold_up predicate used here is Mercury’s idea of a for loop. init_row uses solutions.aggregate and legal_position (defined later) to iterate over the valid positions in a row:

init_row(Piece, Y, !Board) :-
    aggregate(
        (pred(Pos::out) is nondet :-
            legal_position(!.Board, position(_, Y) @ Pos)),
        (pred(Pos::in, !.Board0::in, !:Board0::out) is det :-
            !Board0^positions :=
                det_insert(!.Board0^positions, Pos, Piece),
            !Board0^counts :=
                insert(!.Board0^counts, Piece)),
        !Board).

Note that we call legal_position with a partially instantiated position, position(_, Y), which is also bound to Pos. This means that we need a mode of legal_position which can fill in valid X coordinates of a given row.

Legal positions on the board

Here is a suitable declaration of legal_position:

:- pred legal_position(board, position).  
:- mode legal_position(in, in) is semidet.
:- mode legal_position(in, out) is nondet.
:- mode legal_position(in, bound(position(ground, free)) >> ground) is nondet.
:- mode legal_position(in, bound(position(free, ground)) >> ground) is nondet.

All those modes! Beside the modes where the position argument is in and out (corresponding to checking a legal position and generating all legal positions), the other two modes work with partially instantiated positions, where we wish to know the legal positions in a given column or row (respectively). Thankfully, all four modes can be implemented with one body:

legal_position(Board, position(X, Y)) :-
    int_in_range(0, Board^height - 1, Y),
    int_in_range(0, Board^width - 1, X),
    (even(X), even(Y); odd(X), odd(Y)).

Unfortunately the predicate nondet_int_in_range in Mercury’s int module does not provide a semidet mode and is thus unsuitable for bidirectional programming. Instead, we declare a similar predicate, int_in_range, in our util module:

:- pred int_in_range(int, int, int).
:- mode int_in_range(in, in, in) is semidet.
:- mode int_in_range(in, in, out) is nondet.

To define this predicate, we can make use of Mercury’s support for defining different clauses for different modes:

:- pragma promise_equivalent_clauses(int_in_range/3).
int_in_range(Lo::in, Hi::in, X::out) :- nondet_int_in_range(Lo, Hi, X).  
int_in_range(Lo::in, Hi::in, X::in) :- Lo =< X, X =< Hi.

Because we know that these two clauses are logically equivalent, we promise such to Mercury using promise_equivalent_clauses.

Locating pieces

To query which pieces are at which positions, we declare piece_at. piece_at has two modes, one for checking which piece (if any) is at a given position, and one for iterating over all pieces currently on the board:

:- pred piece_at(board, position, piece).
:- mode piece_at(in, in, out) is semidet.
:- mode piece_at(in, out, out) is nondet.

Our implementation of piece_at should be a thin layer over the positions map, but Mercury’s map module does not provide one predicate which serves both purposes. Again, we can remedy this by using different clauses for each mode:

:- pragma promise_equivalent_clauses(piece_at/3).

piece_at(Board::in, Position::in, Piece::out) :-
    search(Board^positions, Position, Piece).   

piece_at(Board::in, Position::out, Piece::out) :-
    member(Board^positions, Position, Piece).

We can also define no_piece_at, which has an identical signature, as:

no_piece_at(Board, Position) :-
    legal_position(Board, Position),
    not piece_at(Board, Position, _).

Note that no_piece_at(…) is not simply not piece_at(…)! By additionally requiring that Position be a legal position, we restrict the domain of no_piece_at and thus allow it to output the finite number of positions where no piece exists (as opposed to the infinite number of positions which either have no piece or are not legal).

Modifying the board

There are three predicates in board to modify the board: move_piece, remove_piece, and king_piece. move_piece and remove_piece have modes which allow for both modifying a given location(s), and for generating all possible such modifications of the board. king_piece has only one mode to modify a given location. All three have similar code utilizing state variable field update syntax. Here is the implementation of king_piece as an example:

king_piece(Pos, !Board) :-
    piece_at(!.Board, Pos, Piece),
    King = king(Piece),
    !Board^positions := update(!.Board^positions, Pos, King),
    !Board^counts := det_remove(!.Board^counts, Piece),
    !Board^counts := insert(!.Board^counts, King).

It is my opinion that Mercury would benefit from an extension to state variable update syntax which allows state variable record syntax for data terms. Such a syntax would, in concert with the standard library argument order modifications introduced in Mercury 11.07, allow the above predicate to be written more concisely as:

king_piece(Pos, !Board) :-
    piece_at(!.Board, Pos, Piece),
    King = king(Piece),
    update(Pos, King, !Board^positions),
    det_remove(Piece, !Board^counts),
    insert(King, !Board^counts).

(Of course, the update functions allowed to be used with such a syntax must be restricted to those which commute with each other.)

Review

That’s all for now, this post is long enough! Today we saw the overall structure of our checkers AI – ten small modules – and the implementations of color, piece, position, and board, the modules defining the “parts” of the game. Next week we’ll see the implementations of the “rules” and “brains” of the game.

Making the grade

2011-11-09T19:47:00.000-08:00

I know it’s been a while since my last post, but I have something fantastic in the works, I promise. Today though I’d like to talk about the compiler grades.

Common grades

Compiler grades correspond roughly to combinations of different backends (e.g. C or Java) and code models (e.g. parallel or debugging). Those of you new to the language may be confounded by the immense number of these grades, and when initially compiling Mercury it can be difficult to decide which grades to enable. Here’s a list of the ones I use most. In parentheses next to the grade names are the options needed to select the grade, assuming that asm_fast.gc is the default (it usually is).

asm_fast.gc (default): This is my default go-to grade. It compiles to low-level C code using the Boehm garbage collector and supports concurrency (though not parallelism). It is therefore very fast, can interface seamlessly with C libraries, and can run code that uses threads.
asm_fast.par.gc.stseg (--parallel --stack-segments): This grade adds parallelism and stack segmentation to the asm_fast.gc grade. Parallelism allows programs to make use of multiple cores (though note that this isn’t necessary if all you need is concurrent execution). Stack segmentation allows pieces of the stack to be allocated on the heap. Beside allowing programs to grow beyond a fixed stack size, this option (only available for the asm_fast grades) greatly reduces the amount of memory allocated by thread spawns, which can otherwise measure in the gigabytes. Both these features come at a small runtime cost.
hlc.gc (-H): This is the “high-level C” grade. It compiles to C code which is much closer to what a programmer might write than that generated by the asm_fast grade. Curiously this grade is sometimes faster than the low-level grade (likely due to an increased number of optimization opportunities available to GCC), so if you need speed, it is worth testing your application in this grade as well. Unlike asm_fast however, this grade does not support concurrency.
hlc.par.gc (-H --parallel): This grade adds to hlc.gc support for concurrency and parallelism via native system threads. If your application requires a 1:1 mapping of Mercury threads to system threads, then this grade is a must.
java (--java): As the name implies, this grade compiles to Java code. It is the only option if you need to interface with Java libraries. Due to the JVM’s speed the code it generates is often as fast as its C counterparts. This grade supports concurrency and parallelism via Java’s built-in threading mechanisms (--parallel is not needed). (Note that unlike in the C grades, a program compiled under the java grade will not wait for all spawned threads to terminate before exiting.) Unfortunately some standard library procedures are not yet implemented.
asm_fast.gc.prof (-p); hlc.gc.prof (-H -p): These grades allow you to profile your code using mprof, of course at the expense of execution speed. They are handy to have around.
asm_fast.gc.debug (--debug): This grade allows you to debug your code using mdb. Note that if you’re not debugging the standard library, then instead of compiling this grade, you can instead pass the --trace deep option to the compiler to enable debugging of just your modules.

Compiling extra grades

We all know that the Mercury build process takes forever. This is because it must compile the standard library in each of the required grades. Fortunately there’s a shortcut around this.

The first time you install the system, instead of make install, run make install LIBGRADES=asm_fast.gc. This will install the compiler, but only the default library grade. You can pass LIBGRADES= if you wish instead to install no library grades.

Later, when you wish to install additional grades, run make install_grades LIBGRADES='asm_fast.par.gc hlc.gc' (for example). This will compile only the given library grades. To complete the installation, you should edit /path/to/mercury/lib/mercury/conf/Mercury.config to add the appropriate --libgrade flags to the end of DEFAULT_MCFLAGS (near the end of the file), and /path/to/mercury/lib/mercury/mmake/Mmake.vars to add the grade names to the definition of LIBGRADES. (These steps are only necessary if you plan to build and install libraries.)

Don’t forget also that you can speed up compilation by passing PARALLEL=-j5 (where 5 is one plus the number of parallel cores in your system) to make (or simply -j5 to mmake).

Working with multiple grades

Finally, when testing with multiple grades, you may with to use the --use-grade-subdirs option in conjunction with mmc --make. This will cause mmc --make to use a different subdirectory for each grade in which you build your project, avoiding errors caused by linking incompatible grades, and allowing you not to have to rebuild the entire project each time you switch grades.

Parsing Expression Grammars part 3 (final)

2011-10-17T21:56:00.000-07:00

So, last week, we built a C99 parser. A slow C99 parser. At the end we added one line of code that sped things up a bunch. Let’s look at that.

Why recursive descent parsing is slow

Right now our PEG parser is what’s known as a recursive descent parser – it recursively descends down the parse tree looking for matches. Unfortunately, recursive descent is very slow for two reasons:

One grammar rule may try to parse a given span of input the same way multiple times.
Multiple grammar rules may try to parse a given span of input the same way multiple times.

An example of the former is in the selection_statement production (Typedefs omitted for clarity):

selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement, kw("else"), statement) -> [];
    (kw("if"), p("("), expression, p(")"), statement) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).

The string “if (x > 0) x--; return;” will be parsed once by the first branch, which fails when it finds “return” instead of “else”, and will then be reparsed by the second branch, which will succeed. This sort of prefix-repetition can be eliminated by factoring out the repeated portion:

selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement,
     ((kw("else"), statement) -> []; [])) -> [];
    (kw("switch"), p("("), expression, p(")"), statement).

Of course, this means that our code no longer is structurally identical to the grammar we are parsing, but this may be a welcome tradeoff.

An example of the latter, which is more pernicious, is given by, well, almost every production. Why? Consider statement:

statement -->
    labeled_statement -> [];
    compound_statement -> [];
    expression_statement -> [];
    selection_statement -> []; 
    iteration_statement -> []; 
    jump_statement.

Pretty innocuous, right? Take a look at each of the productions it invokes. Almost all of them start with a call to kw, which in turn calls token. In fact, all of them start with a call, directly or indirectly, to token. In fact, this is true of every production in the parser. token is not the only culprit but it is the most pervasive.

This problem can’t be solved through a simple restructuring of the grammar. For this we need a more powerful tool: memoization.

Memoization

Memoization is one of my favorite features. Simply put, memozation of a procedure means maintaining a table of its past input and output values, and looking up output values in this table when the procedure is called in the future with the same input values. Of course, this only works if the procedure has no desirable side effects; fortunately this is true of most Mercury code. Mercury provides built-in support for memoization via the memo pragma. Basic usage is so:

:- pragma memo(Name/Arity).

By default, Mercury memoizes inputs by value. This will catch all repeated calls to a procedure, but can be slow for large structures. We can specify that we would like Mercury to memoize inputs by address instead:

:- pragma memo(Name/Arity, [fast_loose]).

Memoizing by address is much faster but may miss calls with identical values at different addresses. However, because our primary data structure is a list of characters which is very large (meaning memoize-by-value will be slow) and which we are only destructing (meaning identical sublists of this list must be physically the same), we will memoize by address.

Profiling

Now, although we could memoize every procedure in our lexer and parser, memoization does have overhead, and so we should not use it on procedures which are not called often or with repeated inputs. We should limit our scope to grammar rules which are invoked by many different rules, or multiple times by one rule, since these are likely to be invoking the rule repeatedly on the same input.

To find such candidates for memoization, we can use runtime code profiling. To generate profiling data, we can simply compile with the -p switch and then run our program with our test from last week (minus the memoization of token we added at the end):

$ mmc -p --infer-all --make c_parser_direct
$ echo 'int main(void) {}' | ./c_parser_direct

Let that churn for a few seconds, and then kill it with Ctrl+C. Now run mprof to find out what’s been taking so long:

   %  cumulative    self              self    total                                                            
 time    seconds  seconds    calls  ms/call  ms/call name                                                      
 68.5       8.70     8.70 17602295     0.00     0.00 <function `string.to_char_list/1' mode 0> [1]
  7.1       9.60     0.90 17602483     0.00     0.00 <predicate `builtin.unify/2' mode 0> [2]     
  6.3      10.40     0.80   202300     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [3]
  5.9      11.15     0.75   202388     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [4]   
  4.7      11.75     0.60 10114349     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [5]
  4.3      12.30     0.55  7487945     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [6]

Over 95% of that run was spent just checking for keywords and punctuators! In particular, the biggest culprit, string.to_char_list, is highly suspect. To find out who’s calling it, we can run mprof -c [note: sometimes I have had to run mprof -cd; I can’t reproduce this now so I’m not sure why] and search for the callers of string.to_char_list:

                3.70        3.70 7487945/17602295    <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                5.00        5.00 10114350/17602295    <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [22]
[21]    68.5    8.70        8.70 17602295         <function `string.to_char_list/1' mode 0> [21]

The [21] in the left-most column indicates that this section is dedicated to procedure #21, string.to_char_list, which was called 17,602,295 times. The two lines above this line indicate that both lexer.keyword_rec and lexer.punctuator_rec call string.to_char_list, 7,487,945 and 10,114,350 times respectively. Of course; we use string.to_char_list to convert our keywords and punctuators to character lists for these two procedures to match against the input data. But clearly we don't have 7,487,945 keywords or 10,114,350 punctuators, so string.to_char_list must be getting called with the same arguments repeatedly. Let’s fix that:

:- func to_char_list_memo(string) = list(char).
:- pragma memo(to_char_list_memo/1, [fast_loose]).
to_char_list_memo(S) = to_char_list(S).

Unfortunately we can’t instruct Mercury to memoize procedures in other modules, so we must define to_char_list_memo as above. We can replace the two calls to it in lexer.keyword_rec and lexer.punctuator_rec with the memoized version. Compiling and reprofiling tells us:

   %  cumulative    self              self    total                                                             
 time    seconds  seconds    calls  ms/call  ms/call name                                                       
 45.2       1.65     1.65 10400248     0.00     0.00 <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
 12.3       2.10     0.45 10400437     0.00     0.00 <predicate `builtin.unify/2' mode 0> [29]                
 12.3       2.55     0.45  5975244     0.00     0.00 <predicate `c_parser_aim.lexer.punctuator_aux/3' mode 0> [27]
 12.3       3.00     0.45   119606     0.00     0.01 <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]   
  8.2       3.30     0.30  4425004     0.00     0.00 <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]
  4.1       3.45     0.15   119517     0.00     0.02 <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]

Tracking down `token`

We’ve reduced the percentage of total runtime consumed by to_char_list, but it’s still getting called an awful lot. We know keyword_rec and punctuator_rec don’t call it any more often than they need to, so we need to find out who’s calling them. Let’s follow keyword_rec:

                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
[24]    45.0    0.45        1.64  119606         <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]
                0.70        0.70 4425004/10400248    <function `c_parser_aim.to_char_list_memo/1' mode 0> [22]
                0.30        0.49 4425004/4425004     <predicate `c_parser_aim.lexer.keyword_aux/3' mode 0> [28]

keyword is the only caller, no surprise there. Let’s go up one more level:

                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.token/3' mode 0> [1]       
[23]    45.0    0.00        1.64  119606         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]
                0.00        0.00      57/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.45        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword_rec/4' mode 0> [24]        
                0.00        0.00      57/836707      <predicate `char.is_digit/1' mode 0> [59]

Again, no surprise – keyword is called only by token.

                0.00        0.21    7009/119606      <predicate `c_parser_aim.cast_expression/3' mode 0> [8]
                0.00        0.00       2/119606      <predicate `c_parser_aim.compound_statement/3' mode 0>  <cycle 3> [55]
                0.00        0.00     117/119606      <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 4> [35]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [52]             
                0.00        0.00       9/119606      <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [50]
                0.00        0.00       4/119606      <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [47]
                0.00        0.00      36/119606      <predicate `c_parser_aim.enum_specifier/3' mode 0> [42]                        
                0.01        0.43   14042/119606      <predicate `c_parser_aim.kw/3' mode 0> [30]            
                0.00        0.00       3/119606      <predicate `c_parser_aim.labeled_statement/3' mode 0> [53]
                0.00        0.00       8/119606      <predicate `c_parser_aim.p/3' mode 0> [51]                
                0.00        0.00       2/119606      <predicate `c_parser_aim.parameter_list/3' mode 0>  <cycle 2> [54]
                0.00        0.00       1/119606      <predicate `c_parser_aim.parameter_type_list/3' mode 0>  <cycle 2> [56]
                0.00        0.00      26/119606      <predicate `c_parser_aim.pointer/2' mode 0> [45]                             
                0.01        0.43   14019/119606      <predicate `c_parser_aim.postfix_expression/3' mode 0> [25]
                0.01        0.86   28036/119606      <predicate `c_parser_aim.primary_expression/3' mode 0> [26]
                0.00        0.00      48/119606      <predicate `c_parser_aim.struct_or_union/2' mode 0> [41]   
                0.00        0.00      36/119606      <predicate `c_parser_aim.type_qualifier/2' mode 0> [43] 
                0.00        0.00     120/119606      <predicate `c_parser_aim.type_specifier/3' mode 0> [37]
                0.00        0.00      12/119606      <predicate `c_parser_aim.typedef_name/3' mode 0> [48]  
                0.02        1.71   56072/119606      <predicate `c_parser_aim.unary_expression/3' mode 0> [20]
[1]    100.0    0.05        3.65  119606         <predicate `c_parser_aim.lexer.token/3' mode 0> [1]
                0.00        0.00  119517/119517      <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [66]
                0.00        0.00      31/31          <function `string.from_char_list/1' mode 0> [60]             
                0.00        0.10  119517/119517      <predicate `c_parser_aim.lexer.constant/2' mode 0> [32]
                0.00        0.00  119548/239246      <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [67]
                0.00        0.00      31/31          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [71]     
                0.00        1.64  119606/119606      <predicate `c_parser_aim.lexer.keyword/3' mode 0> [23]       
                0.15        1.81  119517/119517      <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [21]
                0.00        0.05  119606/119606      <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [34]

And there’s the culprit. token is called by many other procedures, and as discussed above, almost certainly with repeated inputs. Let’s memoize token.

:- pragma memo(token/3, [fast_loose]).

Now (as we found out last week), parsing our simple test completes in a reasonable amount of time. Let’s give our parser something bigger to chew on.

Factoring out multiple calls from `multiplicative_expression`

For stress testing, I used a 6kb program from Glen McCluskey’s C99 test suite, tdes_047.c. (It is available in the set of free samples if you wish to try it yourself.) We can use this (or any C99 code) as follows:

$ cpp -P c9_samp/tdes_047.c | ./c_parser_direct

And of course, we are met with no end to the parsing. mprof still tells us that token is responsible for the most CPU usage, but since we’ve already memoized it, let’s work our way up the chain some more. Who is responsible for most of the calls to token?

                0.23        0.23 1358257/95678933    <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
                0.00        0.00    7546/95678933    <predicate `c_parser_aim.and_expression/3' mode 0>  <cycle 2> [24]     
                0.00        0.00    2370/95678933    <predicate `c_parser_aim.assignment_operator/2' mode 0> [29]            
                1.36        1.36 8218754/95678933    <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]
                0.00        0.00       7/95678933    <predicate `c_parser_aim.compound_statement/3' mode 0> [14]              
                0.00        0.00     236/95678933    <predicate `c_parser_aim.conditional_expression/3' mode 0>  <cycle 2> [50]
                0.00        0.00      14/95678933    <predicate `c_parser_aim.declaration/4' mode 0> [5]                             
                0.00        0.00     457/95678933    <predicate `c_parser_aim.declaration_specifiers/4' mode 0>  <cycle 2> [43]
                0.00        0.00     392/95678933    <predicate `c_parser_aim.designator/3' mode 0> [48]                             
                0.00        0.00      95/95678933    <predicate `c_parser_aim.direct_abstract_declarator/3' mode 0> [53]
                0.00        0.00     368/95678933    <predicate `c_parser_aim.direct_declarator/4' mode 0>  <cycle 2> [49]
                0.00        0.00     890/95678933    <predicate `c_parser_aim.direct_declarator_rec/3' mode 0>  <cycle 2> [27]
                0.00        0.00     999/95678933    <predicate `c_parser_aim.enum_specifier/3' mode 0> [38]                        
                0.01        0.01   30183/95678933    <predicate `c_parser_aim.equality_expression/3' mode 0>  <cycle 2> [23]
                0.00        0.00    3773/95678933    <predicate `c_parser_aim.exclusive_or_expression/3' mode 0>  <cycle 2> [26]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.identifier_list/2' mode 0> [57]                         
                0.00        0.00    1886/95678933    <predicate `c_parser_aim.inclusive_or_expression/3' mode 0>  <cycle 2> [31]
                0.00        0.00      17/95678933    <predicate `c_parser_aim.init_declarator/4' mode 0> [8]                          
                0.00        0.00       8/95678933    <predicate `c_parser_aim.init_declarator_list/4' mode 0> [7]
                0.00        0.00       2/95678933    <predicate `c_parser_aim.initializer/3' mode 0>  <cycle 2> [56]
                0.00        0.00      48/95678933    <predicate `c_parser_aim.initializer_list/3' mode 0>  <cycle 2> [45]
                0.02        0.02  139104/95678933    <predicate `c_parser_aim.kw/3' mode 0> [22]                               
                0.00        0.00     943/95678933    <predicate `c_parser_aim.logical_and_expression/3' mode 0>  <cycle 2> [39]
                0.00        0.00     472/95678933    <predicate `c_parser_aim.logical_or_expression/3' mode 0>  <cycle 2> [44] 
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                0.00        0.00    1474/95678933    <predicate `c_parser_aim.p/3' mode 0> [34]                                         
                0.00        0.00    1172/95678933    <predicate `c_parser_aim.pointer/2' mode 0> [32]
                2.72        2.72 16437987/95678933    <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [4]
                8.10        8.10 48898650/95678933    <predicate `c_parser_aim.postfix_expression_rec/3' mode 0> [9]             
                2.20        2.20 13264944/95678933    <predicate `c_parser_aim.primary_expression/3' mode 0> [13]   
                0.03        0.03  181101/95678933    <predicate `c_parser_aim.relational_expression/3' mode 0>  <cycle 2> [21]
                0.08        0.08  452752/95678933    <predicate `c_parser_aim.shift_expression/3' mode 0>  <cycle 2> [20]     
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declaration/3' mode 0>  <cycle 2> [54]
                0.00        0.00     136/95678933    <predicate `c_parser_aim.struct_declarator/3' mode 0>  <cycle 2> [52] 
                0.00        0.00      68/95678933    <predicate `c_parser_aim.struct_declarator_list/3' mode 0>  <cycle 2> [55]
                0.00        0.00    1404/95678933    <predicate `c_parser_aim.struct_or_union/2' mode 0> [36]                        
                0.00        0.00     102/95678933    <predicate `c_parser_aim.struct_or_union_specifier/3' mode 0>  <cycle 2> [33]
                0.00        0.00    2334/95678933    <predicate `c_parser_aim.type_qualifier/2' mode 0> [30]                            
                0.00        0.00    4079/95678933    <predicate `c_parser_aim.type_specifier/3' mode 0>  <cycle 2> [25]
                0.00        0.00     333/95678933    <predicate `c_parser_aim.typedef_name/3' mode 0> [51]                   
                0.09        0.09  553352/95678933    <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
[6]     52.1   15.85       15.85 95678933         <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                          
                0.00        0.00      96/96          <function `c_parser_aim.lexer.punctuator_list/0' mode 0> [81]
                0.00        0.00      58/58          <function `string.from_char_list/1' mode 0> [64]             
                0.00        0.00     108/108         <predicate `c_parser_aim.lexer.constant/2' mode 0> [88]
                0.00        0.00     166/412         <predicate `c_parser_aim.lexer.identifier_nondigit/3' mode 0> [86]
                0.00        0.00      58/58          <predicate `c_parser_aim.lexer.identifier_rec/3' mode 0> [97]     
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.keyword/3' mode 0> [83]       
                0.00        0.00      96/96          <predicate `c_parser_aim.lexer.punctuator_rec/4' mode 0> [95]
                0.00        0.00     225/225         <predicate `c_parser_aim.lexer.whitespace/2' mode 0> [82]

It seems like postfix_expression_rec is the winner; it is responsible for over half the calls to token. Tracing it back similarly to how we traced keyword_rec, we find that it is called solely by postfix_expression 8,149,775 times, which is called solely by unary_expression a similar number of times, which is called almost exclusively by cast_expression a similar number of times, which is called almost exclusively by multiplicative_expression, which is called… only 2,054,666 times:

                                 2054666             <predicate `c_parser_aim.additive_expression/3' mode 0>  <cycle 2> [19]
[17]     7.2    0.70        2.18 2054666         <predicate `c_parser_aim.multiplicative_expression/3' mode 0>  <cycle 2> [17]
                1.01        1.01 6112154/95678933    <predicate `c_parser_aim.lexer.token/3' mode 0> [6]                            
                0.47        0.47 6112154/87459316    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [11]
                                 8218660             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [12]

In other words, multiplicative_expression is responsible for multiplying the number of calls passing through it to token fourfold. Let’s look at multiplicative_expression (again, with Typedefs elided):

multiplicative_expression -->
    (cast_expression, p("*"), multiplicative_expression) -> [];
    (cast_expression, p("/"), multiplicative_expression) -> [];
    (cast_expression, p("%"), multiplicative_expression) -> [];
    cast_expression.

A-ha, a repeated prefix! We know how to fix this:

multiplicative_expression -->
    cast_expression,
    ((p("*"), multiplicative_expression) -> [];
     (p("/"), multiplicative_expression) -> [];
     (p("%"), multiplicative_expression) -> [];
     []).

Or, if we’d prefer to leave the structure of the grammar intact at the cost of memoization overhead, we can memoize cast_expression:

:- pred cast_expression(list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/2, [fast_loose]).

(Note the :- pred declaration is necessary as you cannot memoize undeclared modes.)

Notice that most of the other _expression productions follow the same pattern; all of them will in fact benefit from prefix-factoring or memoization. Let’s do that and move on.

Memoizing multiple calls to `unary_expression`

Despite all our optimization so far, token (and postfix_expression_rec in turn) still reigns supreme. However, tracing the chain of calls back this time now reveals this:

                                  170474             <predicate `c_parser_aim.assignment_expression/3' mode 0>  <cycle 2> [15]
                                  198268             <predicate `c_parser_aim.cast_expression/3' mode 0>  <cycle 2> [32]      
[18]     7.2    0.10        0.32  368742         <predicate `c_parser_aim.unary_expression/3' mode 0>  <cycle 2> [18]
                0.03        0.06  171588/338952      <predicate `c_parser_aim.kw/3' mode 0> [30]                           
                0.13        0.13  686352/12057646    <predicate `c_parser_aim.lexer.token/3' mode 0> [7]
                0.03        0.03  686352/11652569    <unification predicate for type `c_parser_aim.lexer.token/0' mode 0> [17]
                                  368742             <predicate `c_parser_aim.postfix_expression/3' mode 0>  <cycle 2> [9]

unary_expression is called approximately an equal number of times by both assignment_expression and cast_expression. This makes it a perfect candidate for memoization:

:- pred cast_expression(typedefs::in, list(char)::in, list(char)::out) is semidet.
:- pragma memo(cast_expression/3, [fast_loose]).

Ad nauseum

And that’s about all there is to it: following this method I found I needed to memoize two more procedures and factor one more, and the parser parsed the test file faster than time could reliably distinguish. Of course, it would behoove us to test on different inputs in order to exercise different grammar productions, but the method is the same.

Summary

To recap:

Use mprof -c to determine which procedure is the largest time hog.
If the procedure is called by primarily one other procedure on a 1:1 basis, or if you have already optimized it, look instead at its most prominent caller (etc.).
If the procedure is called by primarily one other procedure on a greater than 1:1 basis and it is a common prefix of this caller, factor it out.
Otherwise, memoize it.
Recompile and repeat.

This simple method will asymptotically increase the speed of almost any naïve PEG parser, in my experience, from unusable to quite fast, and will avoid the overhead associated with unnecessarily memoizing every grammar production in a parser.

Download c_parser_memo.m

Parsing Expression Grammars part 2

2011-10-09T21:41:00.000-07:00

Continuing last week’s post, I’d like to give an example of implementing a real-life PEG. What better than the C language, specifically, preprocessed C99. (Note: It may be helpful to the reader to open the linked PDF to Annex A, “Language syntax summary”.)

Recall that PEGs differ from more commonly used CFGs in that they are completely deterministic, greedy, and don’t allow left recursion. This can make translating some grammars difficult or impossible if they rely on those features. Fortunately, C99’s grammar is straightforward enough that, with a little thought, we can work around these problems. Let’s see how.

The Lexer

First off, let’s create a submodule for the lexer like so:

:- module lexer.
:- interface.

From the lexer we will export two predicates, one which lexes single tokens (token), and one which lexes a sequence of whitespace (whitespace):

:- pred token(token::out, list(char)::in, list(char)::out) is semidet.
:- pred whitespace(list(char)::in, list(char)::out) is det.

Note: it’s important that we define the mode of the token argument of token as out rather than in, even though they are logically equivalent. This will allow us later to memoize token on one argument instead of two, greatly reducing our memory use at negligible (if any) processing overhead. This trick is possible precisely because PEGs are deterministic: the determinism of token with token as out will still be semidet, rather than nondet (and hence unmemoizable) as it would be with a CFG.

The C99 standard defines 5 kinds of lexical tokens our lexer will need to generate: keyword, identifier, constant, string-literal, and punctuator. A glance through the C99 grammar reveals that merely to parse the grammar, we do not need to know the specific values of constants and string-literals, so let’s define the type of a lexing token like so:

:- type token --->
    keyword(string);
    identifier(string);
    constant;
    string_literal;
    punctuator(string).

Of course, if we were constructing a parse tree, we would want to hold onto the values of constants and string-literals, but for now since we are just matching the grammar it is OK to leave them out. Onto the implementation!

:- implementation.

Whitespace, or, using built-in character classes

Let’s get whitespace out of the way. The C99 definition of whitespace corresponds to Mercury’s definition of whitespace, so we can import char and define whitespace like so:

whitespace --> ([W], {is_whitespace(W)}) -> whitespace; [].

Tokens, or, parameterizing rules

C99 defines token as follows:

token:
        keyword
        identifier
        constant
        string-literal
        punctuator

A straightforward translation of this to a PEG in Mercury would read:

token -->
    keyword -> [];
    identifier -> [];
    constant -> [];
    string-literal -> [];
    punctuator.

However because this is an unordered choice in the spec, but an ordered choice in Mercury, we need to apply a moment’s thought that choosing this ordering will not change the language. Fortunately this is easy to confirm: the latter four each must start with a different character (letter, digit, quote, and non-quote symbol, respectively), and the first (keyword) should be matched in favor of the second (identifier) anyway.

The C99 spec assumes that whitespace is removed in some phase in-between lexing and parsing, and that string literals are concatenated. Because we will be attaching our lexer directly to our parser (such is the magic of PEGs), we will need to perform these two steps in our tokenizer. Finally, we will want to actually produce a lexing token as well (rather than just match it). The final implementation of token is thus:

token(Token) -->
    whitespace,
    (keyword(KW) -> {Token = keyword(KW)};
     identifier(Ident) -> {Token = identifier(Ident)};
     constant -> {Token = constant};
     (string_literal, (token(string_literal) -> []; [])) ->
        {Token = string_literal};
     punctuator(Punct), {Token = punctuator(Punct)}).

Parameterizing rules like this can be very powerful, as we’ll see later in this post. However let’s tackle each kind of token in turn.

Keywords, or, fixing choice using negative lookahead assertions

keyword looks deceptively simple; it is merely a list of valid C99 keywords. We could be tempted to implement it as such:

keyword(Keyword) --> % WRONG
    ['a', 'u', 't', 'o'] -> {Keyword = "auto"};
    ['b', 'r', 'e', 'a', 'k'] -> {Keyword = "break"};
    etc.

But the devil is in the details. We must ensure that no keyword is matched against the prefix of another keyword or identifier (e.g. we should not match the “do” in the keyword “double”, or the “for” in the identifier “format”). The latter can be fixed by using a negative lookahead assertion:

keyword(Keyword) -->
    (['a', 'u', 't', 'o'] -> {Keyword = "auto"};
     ['b', 'r', 'e', 'a', 'k'] -> {Keyword = "break"};
     etc.),
    not (identifier_nondigit(_) -> []; digit(_)).

In other words, a keyword must not be followed immediately by one of the characters which is valid in a keyword or identifier (defined later). Note that this won’t fix partial keyword matching, since if the assertion fails the parse as a keyword will be rejected entirely. Either we should move the negative lookahead assertion into the bodies of the branch, or simply rearrange the offending keywords from longest to shortest. Since “do” and “double” are the only offenders, it’s easy enough to swap them. (We’ll encounter a better example later.)

(Note: In the completed lexer linked below, I add a little extra machinery to allow listing keywords as strings in a list. There is nothing PEG-specific about this technique, and it saves me only keystrokes.)

Identifiers, or, transforming left-recursive productions

Identifiers take a recursive form commonly seen in the C99 spec:

identifier:
        identifier-nondigit
        identifier identifier-nondigit
        identifier digit

That is, an identifier is a certain initial production followed by a sequence of either of two other productions. A direct implementation is no good:

identifier --> % WRONG
    identifier_nondigit --> [];
    (identifier, identifier_nondigit) --> [];
    (identifier, digit).

This will either match a single identifier_nondigit, or, failing to do so, will get caught in a loop calling itself again having consumed no input. Not what we wanted. We need to split the initial call to identifier_nondigit out, and write a right-recursive rule for the rest of the identifier:

identifier --> identifier_nondigit, identifier_rec.

identifier_rec -->
    (identifier_nondigit, identifier_rec) -> [];
    (digit, identifier_rec) -> [];
    [].

Because identifier-nondigit and digit are disjoint, order is not important. This formulation is a proper PEG and will behave as expected.

Of course, we determined earlier that it was important to us to know what identifier was parsed. We can easily add expressions to our rules to do just this:

identifier(from_char_list(Ident): string) -->:
    identifier_nondigit(C), identifier_rec(Ident0),
    {Ident = [C | Ident0]}.

identifier_rec(Ident) -->
    (identifier_nondigit(C), identifier_rec(Ident0)) ->
        {Ident = [C | Ident0]};
    (digit(D), identifier_rec(Ident0)) ->
        {Ident = [D | Ident0]};
    {Ident = []}.

identifier_nondigit(D) -->
    nondigit(D0) -> {D = D0};
    universal_character_name(D0), {D = D0}.

(The typecast to string in the head of identifier is necessary because from_char_list is overloaded and Mercury gets confused otherwise.)

identifier-nondigit, nondigit, and digit are straightforward and use the techniques described thus far. I’m going to skip over them and move onto constants.

Constants, or, fixing choice using order

C99 defines constants as:

constant:
       integer-constant
       floating-constant
       enumeration-constant
       character-constant

Now, the astute reader will notice that something will go wrong here: an integer literal is also a valid floating-point literal in C99, so if we wrote this production directly as a PEG, it would always match the integer part of a floating-point literal as an integer. Again, not what we want.

We could use a negative lookahead assertion as above to fix this, but easier still is to change the order, prioritizing floats about integers:

constant -->
    floating_constant -> [];
    integer_constant -> []; 
    enumeration_constant -> [];
    character_constant.

Easy enough! Since enumeration constants must always start with a letter (which floats and integers do not), and character constants must always start with a single quote, we are all set here.

The rest of the lexer for constants and string literals uses one of these techniques so let’s move on to the last lexing token, punctuators.

Punctuators, or, fixing choice by ordering by length

punctuator, like keyword, is merely a list of strings of symbols which are meaningful in C99. This list presents the same problem that the list of keywords presents. Namely, that a shorter punctuator may match the prefix of what is in fact a longer punctuator. (For example, < may match the symbol <<= in source text.)

For this production, we can use the solution of ordering the punctuators from longest to shortest. This works without much further thought. (Again, I use some extra code to allow me to define this production as a list of strings in my example code.) And that about does it for the lexer.

:- end module lexer.

The Parser

The parser is, largely, trivial, using the same four techniques described above for the lexer (rewriting left-recursion and the three ordering techniques). Since the grammar makes lots of reference to individual kinds of tokens, I defined a few shortcuts:

kw(KW) --> token(keyword(KW)).
id(Ident) --> token(identifier(Ident)).
const --> token(constant).
str_lit --> token(string_literal).
p(Punct) --> token(punctuator(Punct)).

The grammar also references the enumeration-constant production, which, although not a token directly, is merely an identifier and we can treat it as such:

enumeration_const --> id(_).

Ensuring that all calls to the lexer are funneled through token will make our lives easier when we optimize later.

Now before I sign off, there are two gotchas in the grammar, for both of which C is infamous.

The “dangling else”

C’s if/then/[else] has the problem that in a CFG, it is ambiguous. Does if (1) if (0) puts("hi"); else puts("bye"); print bye or nothing? The C99 standard indicates that an else should match the nearest if it legally can (so the above code would print bye).

Fortunately with PEGs, this ambiguity is trivial to solve. We need simply write the if/then/else case first in the grammar. Any else will be matched by the innermost (and thus nearest) if possible:

selection_statement -->
    (kw("if"), p("("), expression, p(")"), statement,
     kw("else"), statement) -> [];
    (kw("if"), p("("), expression, p(")"), statement) -> [];
    etc.

Typedefs

For the unfamiliar, a typedef in C is a means of defining an alias for an existing type specification. There’s nothing odd about that, except that types in C are keywords and the set of types determines the parse. In other words, C is not context-free.

Even though PEGs can directly represent some non-context-free grammars, C is not one of them. Fortunately the problem is easily solved by parameterizing the grammar and passing around a set of “typedefed” identifiers:

:- import bool, set.
:- type typedefs == set(string).

Typedefs are created in declarations, if one of the declaration-specifiers is a storage-class-specifier which is the keyword typedef:

declaration(!Typedefs) -->
    declaration_specifiers(!.Typedefs, HasTypedef),
    (init_declarator_list(Decls0) ->
        {Decls = Decls0}; {Decls = init}),
    p(";"), {if HasTypedef = yes then union(Decls, !Typedefs) else true}.

declaration_specifiers(Typedefs, HasTypedef) -->
    (storage_class_specifier(IsTypedef),
     declaration_specifiers_opt(Typedefs, HasTypedef0)) ->
        {HasTypedef: bool = IsTypedef `or` HasTypedef0};
    (type_specifier,
     declaration_specifiers_opt(Typedefs, HasTypedef0)) ->
        {HasTypedef = HasTypedef0};
    etc.

declaration_specifiers_opt(Typedefs, HasTypedef) -->
    declaration_specifiers(Typedefs, HasTypedef0) ->
        {HasTypedef = HasTypedef0};
    {HasTypedef = no}.

storage_class_specifier(IsTypedef) -->
    kw("typedef") -> {IsTypedef = yes};
    kw("extern")  -> {IsTypedef = no}; 
    kw("static")  -> {IsTypedef = no};
    kw("auto")    -> {IsTypedef = no};
    kw("register"),  {IsTypedef = no}.

(Note the use of state variable syntax, and the factored-out declaration_specifiers_opt, which makes the code passably legible.) Typedef declarations may legally exist in block-items, scoped within compound-statements:

compound_statement(Typedefs) -->
    p("{"), (block_item_list(Typedefs, _) -> []; []), p("}").

block_item_list(!Typedefs) -->
    (block_item(!Typedefs), block_item_list(!Typedefs)) -> [];
    block_item(!Typedefs).

block_item(!Typedefs) --> declaration(!Typedefs) -> []; statement.

and as external-declarations:

translation_unit(!Typedefs) -->
    (external_declaration(!Typedefs), translation_unit(!Typedefs)) -> [];
    external_declaration(!Typedefs).

external_declaration(!Typedefs) -->
    function_definition -> [];
    declaration(!Typedefs).

Finally, typedef-name itself references the set of typedefs:

typedef_name(Typedefs) --> id(Ident), {set.member(Ident, Typedefs)}.

The rest of the implementation of typedefs involves parameterizing most productions to pass the set of typedefs from external_declaration down to typedef_name, and parameterizing the few illegal uses of typedefs (for example, in function signatures) to simply drop the modified set of typedefs. Nothing brain-bending; just a lot of typing.

Summary & Test

In today’s post we saw how to translate a real language grammar, that of C99, into a Mercury program. We learned how to transform left-recursive rules into right-recursive ones, how to ensure proper production choice by using negative lookahead assertions, ordering by precedence, and ordering by length, and how to implement the non-context-free elements of C99.

Let’s add a main function similar to yesterday’s, compile, and try out our spiffy new parser:

$ ./c_parser_direct
int main(void) {}

…and wait, and wait, and wait, and… what’s going on? Why are we getting no response? Do we have an infinite loop?

Nope. Add this line in the lexer module:

:- pragma memo(token/3, [fast_loose]).

$ ./c_parser_direct
int main(void) {}

(wait for it…)

Success!

Success indeed, but it’s still slow. More on memoization and how to speed this up even more in part 3.

Download c_parser_direct.m

Parsing Expression Grammars part 1

2011-10-02T20:32:00.000-07:00

First, I’d like to say welcome back! AiM is returning from its hiatus since April. The summer has been busy for me, but I hope to update AiM weekly henceforth. Let’s start!

Parsing Expression Grammars

Parsing expression grammars (PEGs) are a new-ish formal grammar intended to describe computer languages. PEGs differ from the more commonplace context-free grammars most notably in that PEGs offer an ordered choice operator. The second branch of an ordered choice implicitly does not match any expression which is matched by the first branch. Since any text can only match exactly one branch of an ordered choice, this has the practical implication that a PEG cannot be ambiguous and rule precedence declarations are not needed.

Another natural consequence of ordered choice is that so-called repetition operators (* and +) and the option operator (?) are necessarily greedy. (Any time a grammar rule can recur, ordered choice ensures that it does recur, since the option of not recurring will implicitly not match.)

A PEG rule, as described in the Wikipedia entry, may be either:

a terminal (a symbol),
a nonterminal (another rule), or
the empty string.

or may be constructed from other rules by:

sequence: A B
ordered choice: A / B
zero or more: A*
one or more: A+
optional: A?
positive lookahead: &A
negative lookahead: !A

Positive lookahead and negative lookahead test for a rule match (or lack thereof) but do not consume input.

Translating into Mercury

It’s well-known that CFGs can be implemented directly in logic languages such as Mercury as recursive-descent parsers. Prolog and Mercury even have a special syntax known as definite clause grammar (DCG) to facilitate doing so. It’s similarly easy to implement PEGs, and there’s a surprise at the end!. Let’s see how.

PEG rules can be represented using DCG clauses like so:

rule --> body

The type and mode of rules can be inferred by Mercury, but if you need to declare them, you can do so as:

:- pred rule(list(char)::in, list(char)::out) is semidet.

The elements of the body of a PEG are translated as follows:

terminal C: [C]
nonterminal A: A
empty string: []
sequence A B: A, B
ordered choice A / B: A -> []; B
Note that using -> [] ; (if/else) instead of ; gives us exactly the greedy semantics we require.
optional A?: A -> []; []
positive lookahead &A: not not A; or equivalently, =(S), A, :=(S)
negative lookahead !A: not A

Repetition constructions (A* and A+) are recursive in nature and cannot be represented inline. However we can define higher order rules to represent them as:

:- mode *(pred(in, out) is semidet, in, out) is det.
*(P) --> (P, *(P)) -> []; [].
:- mode +(pred(in, out) is semidet, in, out) is semidet.
+(P) --> (P, +(P)) -> []; P.

Then we can translate repetition constructions as:

zero or more: A*: *(A) or *(pred(in, out) is semidet --> A)
one or more: A+: +(A) or +(pred(in, out) is semidet --> A)

The first, shorter form, can only be used for named rules with defined modes. The longer form is applicable to any rule construct.

Example

Let’s translate the example of basic arithmetic expressions from Wikipedia:

Value   ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum     ← Product (('+' / '-') Product)*
Expr    ← Sum

First let’s get some boilerplate out of the way. We’ll need the list module for DCG-syntax terminals:

:- module peg_example.

:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

:- implementation.

:- import_module list.

Now onto Value. Right off the bat Wikipedia throws us a curveball with the range expression [0-9]. Fortunately that’s easily translated as a specially-moded function:

:- mode in..in = in is semidet.
A..B = C :- A @=< C, C @=< B.

Even though (in fact, because) .. is a builtin operator, we can use it in definitions as well.

Armed with our new range operator, we can translate the Value rule as:

value -->
    +(pred(in, out) is semidet --> ['0'..'9']) -> [];
    (['('], expr, [')']).

Note that the lambda-predicate (pred(in, out) is semidet -->) is needed since ['0'..'9'] is not a predicate on its own. The translation is otherwise straightforward.

Product, Sum, and Expr follow similarly:

product -->
    value, *(pred(in, out) is semidet -->
             ((['*'] -> []; ['/']), value)).

sum -->
    product, *(pred(in, out) is semidet -->
               ((['+'] -> []; ['-']), value)).

expr --> sum.

Our main function will read a line, try to parse it using expr and make sure that only the newline at the end is left unparsed:

main(!IO) :-
    read_line(Res, !IO),
    (if Res = ok(Line) then
        (if expr(Line, ['\n'])
         then print("Success!\n", !IO)
         else print("Failure!\n", !IO)),
        main(!IO)
     else true).

Compile with mmc --infer-all --make peg_example, run it and try out some expressions, keeping in mind that our grammar doesn’t allow negatives, decimals, or whitespace:

$ ./peg_example
123+456
Success!
5/(10-20)
Success!
45-(40
Failure!
*12
Failure!
a+18
Failure!

Of course, a parser that merely tests whether input is well-formed is only mildly useful. Next time, we’ll look into creating parse trees, and the real-life example of parsing C code.

Download peg_example.m

Dependent typing: reversing non-empty lists

2011-04-04T20:06:00.000-07:00

This is the first post in a series on dependent typing. For those not familiar with dependent typing, it is the ability to describe data using types which say something about the particular value of the data itself – the type of the data depends on the value of the data. For example, the phrase “even integers” could describe a dependent type which is a subtype of integers. Whether a number belongs to that type depends on whether its value is evenly divisible by 2. (Note that any subtype can be viewed as a simple kind of dependent type.)

The obvious use of dependent typing is to describe invariants of your program, for example, “a non-NULL pointer”, “an array of length 5”, or “a non-empty list”. While full-blown dependent typing requires a full-blown theorem prover (such as Coq or ATS/LF), many mainstream languages support basic dependent types. Java can infer whether a value is not null, OCaml can talk about types which are made up of only certain constructors, Mozilla’s new language Rust can attach arbitrary predicates to types, and I would be remiss not to mention Mercury.

Mercury has limited but useful support for dependent typing via its mode system. The mode system appears complicated, so let’s start with a brief overview.

The mode system

Each predicate (or function) in Mercury does something to its arguments. The most common possibilities are to look at an argument and leave it be (this would be an input argument), or to attempt to bind it to a value (this would be an output argument). Another possibility is that it destroys its argument after looking at it (this happens with the IO state). These possible actions are called modes.

Modes are described simply by the expected state of the argument before the call, and the guaranteed state of the argument after the call using the syntax Before >> After. These states (called instantiatednesses or insts for short) may be one of free (the argument is unbound), ground or bound (the argument is bound to a value), clobbered (the argument is destroyed or overwritten), and unique (the argument is bound to a value which is not referenced anywhere else). The latter two are used mainly for IO state and arrays, so we will focus on free and bound.

The modes we are all familiar with, in and out, are defined in the module builtin as ground >> ground and free >> ground, respectively. In other words, an in argument is expected to be bound to a value before the call and is left that way after, and an out argument may be unbound before the call, but will be bound after.

The magic is that we may tell Mercury that we expect or guarantee an argument to be bound only to certain constructors! To do this, we must first define the instantiatedness in which we are interested, say non-empty lists:

:- inst non_empty_list == bound([ground | ground]).

This defines the non_empty_list inst as a variable bound to a list cons cell, the head and tail of which are both bound. We could similarly define the inst of an empty list as:

:- inst empty_list == bound([]).

In other words, an empty_list is a variable which is bound to the empty list constructor.

Note that there is also a shorthand which allows the above two insts to be written similarly to type definitions:

:- inst non_empty_list ---> [ground | ground].
:- inst empty_list ---> [].

We could then define the mode of an input argument which is expected to be a non-empty list as:

:- mode in_non_empty_list == non_empty_list >> non_empty_list.

In other words, an argument with this mode would be expected to be bound to a non-empty list, and would remain bound that way afterward. Similarly:

:- mode out_non_empty_list == free >> non_empty_list.

An argument with this mode may be unbound before the call, but is guaranteed to be bound to a non-empty list after. Mercury actually has the mode-functions in() and out() already defined, so we may write in(non_empty_list) or out(non_empty_list) to describe these same modes.

Example

Note: due to bug 191, the following example won’t trigger the proper error messages from the compiler as of this post. ~~Download and apply this patch against the compiler to fix the issue~~, or work around the bug by replacing the two out modes with out(list(ground)), which won’t change the meaning but will avoid triggering the bug. UPDATE: parts of the standard library won’t compile without the bug, so don’t go applying that patch just yet.

On a recent thread at Hacker News, a question was brought up about proving that [X | Xs] = reverse([Y | Ys]) was total (or in Mercury terms, det). Let’s see if we can do this in Mercury using our new tools. First let’s define reverse2 (named so not to conflict with built-in list.reverse) using a recursive auxiliary function with an accumulator:

:- import_module list.

:- func reverse2(list(A)) = list(A).
:- mode reverse2(in) = out.
reverse2(A) = reverse2_aux(A, []).

:- func reverse2_aux(list(A), list(A)) = list(A).
:- mode reverse2_aux(in, in) = out.
reverse2_aux([], Acc) = Acc.
reverse2_aux([A | As], Acc) = reverse2_aux(As, [A | Acc]).

(I have included the mode declarations as we will be adding to them later.) Testing this function in our main predicate is simple:

main(!IO) :-
    Y = 1, Ys = [2,3,4,5],  
    [X | Xs] = reverse2([Y | Ys]),
    print({X, Xs}, !IO), nl(!IO).

A sharp eye will spot the problem before we even compile: the deconstruction [X | Xs] = can fail, which would cause main to be semidet instead of det (that is, a partial function). The compiler spots this as well:

In `main'(di, uo):
  error: determinism declaration not satisfied.
  Declared `det', inferred `semidet'.
  Unification with `list.[X | Xs]' can fail.

Of course this isn’t true. What the compiler would love to see is a mode declaration stating that if reverse2 is passed a non-empty list, then it will return a non-empty list. Let’s write that:

:- mode reverse2(in(non_empty_list)) = out(non_empty_list).

Now we get a different error:

In clause for `reverse2(in((list.non_empty_list))) =
  out((list.non_empty_list))':
  mode error: argument 2 did not get sufficiently instantiated.
  Final instantiatedness of `HeadVar__2' was `ground',
  expected final instantiatedness was `bound(list.'[|]'(ground,
  ground))'.

(Note: if you didn’t/can’t apply the above compiler patch, as a workaround, replace out in the original mode declarations with out(list(ground)).)

Here, HeadVar__2 refers to the function result (what would be the second variable in the head were reverse2 a predicate). The compiler expected the function result to be non-empty (bound to list.[|]) but it was bound to an unknown value of type list. Of course this is because reverse2_aux doesn't guarantee a non-empty result on non-empty input, so let’s add a declaration stating so as well:

:- mode reverse2_aux(in(non_empty_list), in) = out(non_empty_list).

(Note that we don’t say anything about the accumulator in this declaration.) Unfortunately recompiling gives us a similar error, this time complaining about the second (the recursive) clause of reverse2_aux. Of course – if our initial list had only one item, the recursive call would pass reverse2_aux an empty list, triggering the first clause, and we haven’t made (and can’t make) any special guarantees about that case. Let's take a closer look at that clause:

reverse2_aux([], Acc) = Acc.

The only way that this clause will return a non-empty list is if the accumulator, Acc, is non-empty. Fortunately, it is called recursively with a non-empty accumulator (since elements are prepended on every recurrence), so this should be enough to convince the compiler. Let’s add a mode declaration stating this fact:

:- mode reverse2_aux(in, in(non_empty_list)) = out(non_empty_list).

Success! Compilation is successful, meaning main has been proved to be det (a total function), and running the program gives the expected output:

{5, [4, 3, 2, 1]}

Using Mercury’s mode system, we were able to prove that reversing a non-empty list results in a non-empty list, by adding descriptive mode declarations in strategic locations.

As an exercise, add an inst describing a list of at least two elements and add modes to reverse2 and reverse2_aux which make guarantees about lists of at least two elements.

Download reverse.m

Enabling float unboxing on 64-bit systems

2011-04-02T19:22:00.000-07:00

In many high-level languages, it’s desirable that all data be represented with the same size word (usually a system word, 32- or 64-bits) so that polymorphic functions need not perform run-time dispatch, nor be type-specialized at compile time, to work with data of different sizes. This is usually done by “boxing” non-word-size data by storing it in the heap and passing around a word-sized pointer to the data.

Of course for “small” data boxing entails the performance overhead of an extra pointer indirection. (For “large” data boxing is usually a performance win since it reduces the amount of data which must be copied; this is also why in C structures are usually passed by reference.) For this reason anything smaller than or equal in size to a word will usually be stored unboxed. On 32-bit systems, this means characters, integers, enumerations, and single-precision (32-bit) floating-point numbers, but not double-precision (64-bit) floating-point numbers. On 64-bit systems, all of the above including doubles can be stored unboxed.

Mercury follows these same conventions, which does not bode well for scientific computations on 32-bit systems, since all doubles will be boxed. (This is actually special-cased for arrays and structures in OCaml.)

Fortunately Mercury does store doubles unboxed on 64-bit systems, with one gotcha: Double-precision floats will only be stored unboxed if the compiler is built with an existing compiler, and not bootstrapped! This means that after you first install Mercury, you must make realclean, ensure that mmc and friends are in your PATH, and repeat the installation (including ./configure). You can check that configure enabled float unboxing with grep unboxed config.log; you will see messages indicating its status.

If you are doing any sort of scientific calculations on a 64-bit machine it’s important to ensure that this optimization is enabled; it can easily double the performance of any float-heavy computations.

Benchmarks Game: spectral-norm

2011-04-02T07:15:00.000-07:00

The Computer Language Benchmarks Game measures the performance of programming language implementations over a set of algorithms. Mercury is known for being fast, so let’s put it to the test. We will compare it to four high-performing language implementations, C (gcc), Java (Sun), Haskell (GHC), and OCaml, on a 64-bit four-way processor (Intel Atom D510).

Problem description

Today we will develop a solution to the spectral-norm test, whose description reads:

Calculate an eigenvalue using the power method.

Eigenvectors are vectors which, when multiplied by a matrix, produce a multiple of themselves. The eigenvalue of this eigenvector is this multiple. The power method involves repeatedly multiplying a random vector by the matrix in question until the vector converges. This is then an eigenvector, and its associated eigenvalue is found easily.

To compute the spectral norm of a matrix, one simply computes the square root of the maximum eigenvalue (which coincidentally is the one found by the power method).

Implementation

Let’s take the design of the single-threaded Java entry as a template. This way, we can ensure that we are using the same algorithm as the other implementations, which is a requirement of the Benchmarks Game.

The interface and main predicate are unremarkable:

:- module spectralnorm.                   

:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

:- implementation.

:- import_module float, int, list, math, std_util, string.  

main(!IO) :-
command_line_arguments(Args, !IO),
(if to_int(head(Args), N0) then N = N0 else N = 100),
format("%.9f\n", [f(approximate(N))], !IO).

Note that we have imported list, for use in representing the vectors. One may think that using arrays would perform much better but surprisingly the gain is negligible in this program (mostly because there are no random accesses), and would come at the cost of more complicated code.

The approximate function works by iterating multiplication 20 times, and using that result and the second-to-last to compute the spectral norm. This function is easily written using higher-order functions; in particular, list.foldl and list.map_corresponding are used to compute the dot product of two vectors:

approximate(N) = sqrt(VBv / Vv) :-
V = pow(multiplyAtAv(N), 19, unit(N)),
U = multiplyAtAv(N, V),
VBv = foldl((+), map_corresponding((*), U, V), 0.0),
Vv = foldl((+), map_corresponding((*), V, V), 0.0).

The unit vector, used as the initial vector, is easily defined recursively:

unit(N) = (if N = 0 then [] else [1.0 | unit(N - 1)]).

Next we define a function a to represent the matrix we are using:

a(I, J) = 1.0 / float((I + J) * (I + J + 1) / 2 + I + 1).

Next we define multiplication of the matrix by a vector by using two recursive functions. multiplyAvI multiplies each row I < N of the matrix by the vector V by calling multiplyAvIJ, which multiplies and sums each element of a row. multiplyAvI simply starts the process.

multiplyAv(N, V) = multiplyAvI(N, 0, V).

multiplyAvI(N, I, V) = YYs :-
if I < N then
Y = multiplyAvIJ(I, 0, V),  
Ys = multiplyAvI(N, I + 1, V),  
YYs = [Y|Ys]
else YYs = [].  

multiplyAvIJ(_, _, []) = 0.0.
multiplyAvIJ(I, J, [X|V]) = a(I, J) * X + multiplyAvIJ(I, J + 1, V).

To multiply by the transpose of A, we similarly define multiplyAtv:

multiplyAtv(N, V) = multiplyAtvI(N, 0, V).

multiplyAtvI(N, I, V) = YYs :-
if I < N then
Y = multiplyAtvIJ(I, 0, V),
Ys = multiplyAtvI(N, I + 1, V),
YYs = [Y|Ys]
else YYs = [].

multiplyAtvIJ(_, _, []) = 0.0.
multiplyAtvIJ(I, J, [X|V]) = a(J, I) * X + multiplyAtvIJ(I, J + 1, V).

Finally, multiplyAtAv ties these together. Note we must declare its type, since we partially apply it in approximate.

:- func multiplyAtAv(int, list(float)) = list(float). 
multiplyAtAv(N, V) = multiplyAtv(N, multiplyAv(N, V)).

Compiling with mmc --infer-all -O6 --intermod-opt --rebuild spectralnorm, we obtain a runtime of about 38.1 seconds for N = 3000. Not too bad, considering that the highly optimized implementation has a single-threaded runtime of about 26.5 seconds. But we can do better.

Optimization

Compiling with -p and running mprof after running our program tells us that the majority of our time (99%) is spent in multiplyAtIJ and multiplyAtvIJ, as expected. So let's focus our efforts there, and in a, which apparently has been inlined into each of those functions, since it does not otherwise appear in the profile.

Unchecked arithmetic

The most obvious speedup is to replace the floating-point division in a with `unchecked_quotient`, which doesn't check if its right argument is zero. (We make sure to add parentheses as appropriate, since infix functions bind very tightly.) Recompiling, this reduces our runtime to 29.8 seconds! Similarly, replacing the integer division by two with `unchecked_right_shift` 1 reduces the runtime to 28.4 seconds.

Tail-call optimization / accumulator introduction

Next, both multiplyAvIJ and multiplyAtvIJ are written in a fashion which precludes tail-call optimization: they perform an operation on the result of the recursive call, which prevents Mercury from turning them into simple loops. Fortunately, we can direct mmc to transform these functions into accumulator-passing functions (which can be optimized) by using the --introduce-accumulators, and telling Mercury that floating-point addition is associative:

:- promise all [A, B, C, ABC]
(A + B) + C = ABC: float <=> A + (B + C) = ABC: float.

Unfortunately, due to bug 193, mmc doesn't recognize this declaration due to the typecast (which is necessary to distinguish between integer- and floating-point addition), so we must instead introduce a dummy function fplus to disambiguate:

:- func fplus(float, float) = float.
A `fplus` B = A + B.

:- promise all [A, B, C, ABC]
(A `fplus` B) `fplus` C = ABC <=> A `fplus` (B `fplus` C) = ABC.

After replacing our use of + with `fplus` in multiplyAvIJ and multiplyAtvIJ (and again, parenthesizing appropriately), we cut our runtime down to 25.2 seconds -- better than the C implementation!

An optimization along similar lines is to enable --optimize-constructor-last-call, which can enable tail-call optimization in functions like unit, multiplyAvI, and multiplyAtvI, which construct a list after their recursive call. While this is a win for smaller values of N, it actually degrades performance for larger values so we will not enable it.

Parallelization

Comparing our 25.2 seconds to the C implementation’s 26.5 seconds is not really fair, since the C implementation is parallelized, running in 6.9 seconds across four CPUs.

Fortunately parallelization in Mercury is simple. The & operator solves two goals in parallel, expecting them both to succeed (like ,). Our job is to decide where is the best place to place this operator.

At the top level, operations in approximate and multiplyAtAv are necessarily sequential, since the result of each successive multiplication depends on the result of the previous.

Moving down a level, the two operations in multiplyAtI and multiplyAtvI seem parallelizable, as they do not depend on one another. Let’s introduce a parallel goal:

multiplyAvI(N, I, V) = YYs :-
if I < N then
(Y = multiplyAvIJ(I, 0, V) &
Ys = multiplyAvI(N, I + 1, V)),
YYs = [Y|Ys]
else YYs = [].

multiplyAtvI is modified similarly. Note the placement of the parentheses: & binds more loosely than ,. If we did not have parentheses, the list construction would be moved into the second branch, making it dependent on the first branch, and causing (for reasons unknown to the author) massive amounts of memory to be used.

At the lowest level, we could introduce parallel goals in multiplyAvIJ and multiplyAtvIJ, but we should have sufficient parallelization already, and the operations at this level are so short that they would be dwarfed by parallelization overhead.

Compiling with mmc --parallel --infer-all -O6 --intermod-opt --introduce-accumulators --rebuild spectralnorm, we obtain a runtime of 7.2 seconds – less than 5% slower than the C counterpart. The difference between Mercury and C is even less pronounced at N = 5500, coming in at 23.8 and 23.7 seconds respectively.

Results

Results obtained using the Benchmark Game’s bencher.py program for N = 5500:

×	Language	CPU (s)	Elapsed (s)	Memory (kB)	Code (B)	CPU load
1.0	java	87.24	23.36	15617	1227	385%
1.0	ocaml	88.94	23.73	4037	1065	391%
1.0	gcc	89.98	23.74	763	1992	399%
1.0	mmc --parallel	90.85	23.83	54974	722	394%
1.3	ghc	95.45	29.27	2496	1346	346%
3.1	mmc -H	70.06	70.09	7118	722	116%
3.7	mmc	84.96	84.98	40751	722	112%
3.9	mmc --java	88.86	88.21	33007	722	113%

Curiously, the compiling with the -H flag (to enable high-level C “hlc” grade) generates much faster single-threaded code. Checking the generated C code in Mercury/cs/spectralnorm.c indicates that tail calls are implemented using loops in this version, compared to calls to a Mercury primitive in the other versions. Unfortunately the high-level C grade does not support the and-parallelism we are using.

Download spectralnorm.m

Optimization Levels

2011-03-28T14:17:00.000-07:00

-O optimization levels run from -1 to 6. Which options are enabled in these levels don’t seem to be documented in the User’s Guide or the mmc(1) man page. They are however documented in the compiler source code. -O2 is the default.

Note: Order matters! For example, --introduce-accumulators followed by -O6 will actually disable --introduce-accumulators, because it is disabled in -O6. For this reason you should put -O options first on the command line.

`-O-1`

Disables all optimizations which can be disabled.

--no-common-data
--no-llds-optimize
--no-optimize-jumps
--no-optimize-labels
--no-optimize-peep
--no-smart-indexing
--no-static-ground-cells

`-O0`

Minimizes compilation time.

--excess-assign
--optimize-dead-procs
--optimize-repeat 1
--no-c-optimize
--no-emit-c-loops
--no-middle-rec
--no-optimize-delay-slot
--no-optimize-frames
--no-optimize-tailcalls
--no-use-local-vars

`-O1`

Cheap optimizations with good payoff.

--no-common-struct
--no-follow-code
--no-inline-simple
--no-inline-single-use
--no-optimize-fulljumps
--no-optimize-initializations
--no-simple-neg

`-O2`

Optimizations with good payoff.

--inline-compound-threshold 10
--optimize-dups
--optimize-repeat 1
--user-guided-type-specialization

`-O3`

Slow optimizations with good payoff.

--constraint-propagation
--deforestation
--local-constraint-propagation
--optimize-higher-order
--optimize-reassign
--optimize-repeat 4
--optimize-saved-vars
--optimize-unused-args

`-O4`

Slow optimizations.

--higher-order-size-limit 30
--inline-compound-threshold 20
--inline-simple-threshold 8

`-O5`

Very slow optimizations.

--delay-constructs
--eliminate-local-vars
--higher-order-size-limit 40
--inline-compound-threshold 100
--loop-invariants
--optimize-repeat 5

`-O6`

Unreasonably slow optimizations.

--everything-in-one-c-function
--inline-alloc
--use-macro-for-redo-fail

`--optimize-space`

Optimize for code size rather than speed. This can be used in conjunction with one of the other levels, keeping in mind the aforementioned rules of ordering.

--optimize-dead-procs
--optimize-dups
--optimize-proc-dups
--optimize-reassign
--unneeded-code-copy-limit 1
--no-optimize-fulljumps

Additionally, --optimize-space turns off the following options if set:

--inline-alloc
--loop-invariants
--use-macro-for-redo-fail
--no-optimize-labels

Optimizations never enabled

These can make code run slower.

--checked-nondet-tailcalls
--constraint-propagation
--introduce-accumulators (buggy with trailed updates)
--optimize-constructor-last-call
--optimize-duplicate-calls (buggy with array.init)
--type-specialization
--unneeded-code

Additionally, intermodule optimizations are never enabled by -O because they require special handling by mmc --make.

Project Euler problem 2

2011-03-25T12:16:00.000-07:00

Project Euler problem 2 asks:

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Again we start with main module boilerplate:

:- module p2.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

In our implementation, we will again use the int module. Because the range of terms to sum is not definite (we do not know how many terms are less than four million without prior analysis), solutions will not be useful to us. (Note that even the solutions.do_while predicate cannot be relied upon, as the order in which terms are generated is not guaranteed, and we have no efficient way of knowing if we’ve found all the terms of interest.)

:- implementation.
:- import_module int.

First let’s define the recursive function fibonacci of one argument to compute the Ith Fibonacci number.

:- func fibonacci(int) = int.
:- pragma memo(fibonacci/1).
fibonacci(I) =
    (if I = 0 then 1
     else if I = 1 then 2
     else fibonacci(I - 1) + fibonacci(I - 2)).

Normally a function so defined would be exponential with I. However, by instructing the Mercury compiler to memoize fibonacci via the memo pragma, we achieve linear performance as if we had used a dynamic programming algorithm. (Note that memoization is not yet available in all compilation grades, including Java.)

To perform the summation, we can define a recursive function which recurs on successive values of I until fibonacci(I) is not less than four million:

sum_fibonaccis(I) =
    (if fibonacci(I) @ Fibo < 4000000 then
        sum_fibonaccis(I + 1) +
            (if even(Fibo) then Fibo else 0)
     else 0).

The @ function is used to bind fibonacci(I) to Fibo in an expression context (akin to C’s = operator). The built-in predicate int.even is used to test for evenness.

Those readers familiar with functional programming will likely notice that this function is not tail-recursive; i.e., the result of the recursive call is not returned directly. This will cause unnecessary (though, in this case, benign) stack growth. The typical solution is to rewrite fibonacci with an accumulator argument:

sum_fibonaccis_acc(I, Sum) =
    (if fibonacci(I) @ Fibo < 4000000 then
        sum_fibonaccis_acc(I + 1, Sum +
            (if even(Fibo) then Fibo else 0))
     else Sum).

Mercury however is able to automatically perform this transformation, provided we inform it that (Fibo0 + Fibo1 + ...) + FiboN is indeed the same as Fibo0 + (Fibo1 + ... + FiboN) – that is, that addition is associative:

:- promise all [A, B, C, ABC] (A + B) + C = ABC <=> A + (B + C) = ABC.

This optimization is enabled with the --introduce-accumulators option to mmc. The compiler unfortunately can recognize only a narrow set of functions and predicates as candidates for accumulator introduction. In particular, altering our definition of sum_fibonaccis to bind Fibo = fibonacci(I) in a separate body instead of using @, or moving the innermost if to outside the recursive call, causes the compiler not to recognize this function as a candidate for accumulator introduction.

The remainder of our implementation is unremarkable:

p2 = sum_fibonaccis(0).
main --> print(p2), nl.

Compiling the program with mmc --infer-all --introduce-accumulators --make p2 produces a large warning concerning possible performance problems due to the accumulator introduction altering the order of summation. This is obviously not a problem for addition and can be safely ignored; the warning is meant for operations such as linked list concatenation, the performance of which depends on the value of one operand more than that of the other. If desired this warning can be disabled with the --inhibit-accumulator-warnings option.

The resulting program runs in about 100 ms on my dual-core 1.6 GHz machine.

Download p2.m

Project Euler problem 1

2011-03-24T11:59:00.000-07:00

Project Euler problem 1 asks:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

We start by declaring the module and defining its interface, which is standard boilerplate for the main module of a program:

:- module p1.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.

In our implementation, we will utilize the int and solutions modules from the standard library:

:- implementation.
:- import_module int, solutions.

The integers provided by int are guaranteed to be at least 32-bit and so will be large enough for this problem. solutions provides the aggregate function which we will use to sum the results.

First let’s define the numbers we’re interested in by defining a predicate, aptly named multiple_of_3_or_5_below_1000. It is a predicate of one argument, which, when used as an output (::out), produces zero or more results (is nondet).

:- pred multiple_of_3_or_5_below_1000(int::out) is nondet.

Implementing this predicate is easy using the builtin nondet_int_in_range predicate to generate integers, and rem to check that the remainder is zero when divided by 3 or 5:

multiple_of_3_or_5_below_1000(I) :-
nondet_int_in_range(1, 999, I),
(I rem 3 = 0; I rem 5 = 0).

We can then define the problem solution p1 as the sum of the multiples using builtins aggregate and plus:

p1 = aggregate(multiple_of_3_or_5_below_1000, plus, 0).

The type of p1 can be inferred by the compiler as a function of zero arguments returning an int.

Because plus is associative and commutative, we could use the unsorted_aggregate predicate instead to avoid the overhead of sorting the solutions first (since the order doesn’t matter). However, it turns out that the overhead is negligible in this program, so we can stick with the aggregate function for the sake of clarity.

Lastly, print the result. The IO state is threaded using DCG notation.

main --> print(p1), nl.

We can then build the program using mmc --infer-all --make p1. The resulting program runs in about 100 ms on my dual-core 1.6 GHz machine.

Download p1.m

Introduction

2011-03-24T11:44:00.000-07:00

Hello there! This blog will be about exploring the Mercury programming language. Mercury is a declarative logic language like Prolog, but is purely functional and strongly typed like Haskell.

I do not intend this blog to be an introduction to the language unto itself; the Mercury documentation covers that fairly well. I rather intend it to be a companion to these introductory materials, or as a “bird’s eye view” of the capabilities of Mercury for those interested in learning it further.

Of course I will be happy to answer any questions about the language posed in the comments; I have been using Mercury on-and-off for a couple years and can probably field most basic questions.

My first series of posts will cover Mercury solutions to the great Project Euler problems, a collection of numeric problems with often non-trivial solutions.

With that, let’s begin!

Adventures in Mercury

Checkers AI, part 2

moves module

game module

Initialization of the game

Rules of the game

Checking the state of the game

game_io module

checkers module

Players

Why typeclasses?

The main loop

Trying it out

Conclusion

Checkers AI, part 1

Architecture

color module

piece module

position module

board module

Initializing the board

Legal positions on the board

Locating pieces

Modifying the board

Review

Making the grade

Common grades

Compiling extra grades

Working with multiple grades

Parsing Expression Grammars part 3 (final)

Why recursive descent parsing is slow

Memoization

Profiling

Tracking down token

Factoring out multiple calls from multiplicative_expression

Memoizing multiple calls to unary_expression

Ad nauseum

Summary

Further reading

Parsing Expression Grammars part 2

The Lexer

Whitespace, or, using built-in character classes

Tokens, or, parameterizing rules

Keywords, or, fixing choice using negative lookahead assertions

Identifiers, or, transforming left-recursive productions

Constants, or, fixing choice using order

Punctuators, or, fixing choice by ordering by length

The Parser

The “dangling else”

Typedefs

Summary & Test

Parsing Expression Grammars part 1

Parsing Expression Grammars

Translating into Mercury

Example

Dependent typing: reversing non-empty lists

The mode system

Example

Enabling float unboxing on 64-bit systems

Benchmarks Game: spectral-norm

Problem description

Implementation

Optimization

Unchecked arithmetic

Tail-call optimization / accumulator introduction

Parallelization

Results

Optimization Levels

-O-1

-O0

-O1

-O2

-O3

-O4

-O5

-O6

--optimize-space

Optimizations never enabled

Project Euler problem 2

Project Euler problem 1

`moves` module

`game` module

`game_io` module

`checkers` module

`color` module

`piece` module

`position` module

`board` module

Tracking down `token`

Factoring out multiple calls from `multiplicative_expression`

Memoizing multiple calls to `unary_expression`

`-O-1`

`-O0`

`-O1`

`-O2`

`-O3`

`-O4`

`-O5`

`-O6`

`--optimize-space`