Monday, November 30, 2015

Chess Programming Vi Parts

Developing a chess program able to play in a chess server will be my next goal for 2016
so I better start piling tons of info for my own study online, who ever needs to this information is welcome

Chess Programming Part I: Getting Started

By François Dominic Laramée | Published May 17 2000 01:30 PM in Artificial Intelligence

This is the first article in a six-part series about programming computers to play chess, and by extension other similar strategy games of perfect information.

Chess has been described as the Drosophila Melanogaster of artificial intelligence, in the sense that the game has spawned a great deal of successful research (including a match victory against the current world champion and arguably the best player of all time, Gary Kasparov), much like many of the discoveries in genetics over the years have been made by scientists studying the tiny fruit fly. This article series will describe some of the state-of-the-art techniques employed by the most successful programs in the world, including Deep Blue.

Note that by the time the series is completed (in October), I will have written a simple implementation of the game in Java, and the source code will be freely available for download on my web site. So if you want to see more code samples, be patient; I'll give you plenty in due time!
Games of Perfect Information

Chess is defined as a game of "perfect information", because both players are aware of the entire state of the game world at all times: just by looking at the board, you can see which pieces are alive and where they are located. Checkers, Go, Go-Moku, Backgammon and Othello are other members of the category, but stud poker is not (you don't know what cards your opponent is holding in his hands).

Most of the techniques described in this series will apply more or less equally to all games of perfect information, although the details will vary from game to game. Obviously, while a search algorithm is a search algorithm no matter what the domain, move generation and position evaluation will depend completely on the rules of the game being played!
What We Need

In order to play chess, a computer needs a certain number of software components. At the very least, these include:

•             Some way to represent a chess board in memory, so that it knows what the state of the game is.
•             Rules to determine how to generate legal moves, so that it can play without cheating (and verify that its human opponent is not trying to pull a fast one on it!)
•             A technique to choose the move to make amongst all legal possibilities, so that it can choose a move instead of being forced to pick one at random.
•             A way to compare moves and positions, so that it makes intelligent choices.
•             Some sort of user interface.
This series will cover all of the above, except the user interface, which is essentially a 2D game like any other. The rest of this article describes the major issues related to each component and introduces some of the concepts to be explored in the series.
Board Representations

In the early days of chess programming, memory was extremely limited (some programs ran in 8K or less) and the simplest, least expensive representations were the most effective. A typical chessboard was implemented as an 8x8 array, with each square represented by a single byte: an empty square was allocated value 0, a black king could be represented by the number 1, etc.

When chess programmers started working on 64-bit workstations and mainframes, more elaborate board representations based on "bitboards" appeared. Apparently invented in the Soviet Union in the late 1960's, the bit board is a 64-bit word containing information about one aspect of the game state, at a rate of 1 bit per square. For example, a bitboard might contain "the set of squares occupied by black pawns", another "the set of squares to which a queen on e3 can move", and another, "the set of white pieces currently attacked by black knights". Bitboards are versatile and allow fast processing, because many operations that are repeated very often in the course of a chess game can be implemented as 1-cycle logic operations on bitboards.

Part II of this series covers board representations in detail.
Move Generation

The rules of the game determine which moves (if any) the side to play is allowed to make. In some games, it is easy to look at the board and determine the legal moves: for example, in tic-tac-toe, any empty square is a legal move. For chess, however, things are more complicated: each piece has its own movement rules, pawns capture diagonally and move along a file, it is illegal to leave a king in check, and the "en passant" captures, pawn promotions and castling moves require very specific conditions to be legal.

In fact, it turns out that move generation is one of the most computationally expensive and complicated aspects of chess programming. Fortunately, the rules of the game allow quite a bit of pre-processing, and I will describe a set of data structures which can speed up move generation significantly.

Part III of this series covers this topic.
Search Techniques

To a computer, it is far from obvious which of many legal moves are "good" and which are "bad". The best way to discriminate between the two is to look at their consequences (i.e., search series of moves, say 4 for each side and look at the results.) And to make sure that we make as few mistakes as possible, we will assume that the opponent is just as good as we are. This is the basic principle underlying the minimax search algorithm, which is at the root of all chess programs.

Unfortunately, minimax' complexity is O(bn), where b ("branching factor") is the number of legal moves available on average at any given time and n (the depth) is the number of "plies" you look ahead, where one ply is one move by one side. This number grows impossibly fast, so a considerable amount of work has been done to develop algorithms that minimize the effort expended on search for a given depth. Iterative-deepening Alphabeta, NegaScout and MTD(f) are among the most successful of these algorithms, and they will be described in Part IV, along with the data structures and heuristics which make strong play possible, such as transposition tables and the history/killer heuristic.

Another major source of headaches for chess programmers is the "horizon effect", first described by Hans Berliner. Suppose that your program searches to a depth of 8-ply, and that it discovers to its horror that the opponent will capture its queen at ply 6. Left to its own devices, the program will then proceed to throw its bishops to the wolves so that it will delay the queen capture to ply 10, which it can't see because its search ends at ply 8. From the program's point of view, the queen is "saved", because the capture is no longer visible... But it has lost a bishop, and the queen capture reappears during the next move's search. It turns out that finding a position where a program can reason correctly about the relative strength of the forces in presence is not a trivial task at all, and that searching every line of play to the same depth is tantamount to suicide. Numerous techniques have been developed to defeat the horizon effect; quiescence search and Deep Blue's singular extensions are among the topics covered in Part V on advanced search.

Finally, the program must have some way of assessing whether a given position means that it is ahead or that it has lost the game. This evaluation depends heavily upon the rules of the game: while "material balance" (i.e., the number and value of the pieces on the board) is the dominant factor in chess, because being ahead by as little as a single pawn can often guarantee a victory for a strong player, it is of no significance in Go-Moku and downright misleading in Othello, where you are often better off with fewer pieces on the board until the very last moment.

Developing a useful evaluation function is a difficult and sometimes frustrating task. Part VI of this series covers the efforts made in that area by the developers of some of the most successful chess programs of all time, including Chess 4.5, Cray Blitz and Belle.

Now that we know which pieces we will need to complete the puzzle, it is time to get started on that first corner. Next month, I will describe the most popular techniques used to represent chess boards in current games. See you there!

François Dominic Laramée, April 2000

Chess Programming Part II: Data Structures

Last month, I presented the major building blocks required to write a program to play chess, or any other two-player game of perfect information. Today, I will discuss in a bit more detail the most fundamental of these building blocks: the internal representation of the game board.

You may be surprised to notice that, for all intents and purposes, the state of the art in this area has not changed in thirty years. This is due to a combination of ingenuity (i.e., smart people made smart choices very early in the field's history) and necessity (i.e., good data structures were a pre-requisite to everything else, and without these effective techniques, not much would have been achieved.)

While we're at it, I will also present three support data structures which, although not absolutely required to make the computer play, are invaluable if you want it to play well. Of these, two (one of which consumes ungodly amounts of memory) are designed to accelerate search through the game tree, while the third is used to speed up move generation.

Before we go any further, a word to the wise: in chess as in any other game, the simplest data structure that will get the job done is usually the one you should use. While chess programmers have developed numerous clever data representation tricks to make their programs go faster, very simple stuff is quite sufficient in many other games. If you are a novice working on a game for which there is limited literature, start with something easy, encapsulate it well, and you can experiment with more advanced representations once your program works.

Basic Board Representations

Back in the 1970's, personal computer memory was at a premium (that's an understatement if there ever was one!), so the most compact the board representations, the better. There is a lot to be said for the most self-evident scheme: a 64-byte array, where each byte represents a single square on the board and contains an integer constant representing the piece located in that square. (Any chess board data structure also needs a few bytes of storage to track down en passant pawn capture opportunities and castling privileges, but we'll ignore that for now, since this is usually implemented separately, and pretty much always the same way.)

A few refinements on this technique soon became popular:

•             The original SARGON extended the 64-byte array by surrounding it with two layers of "bogus squares" containing sentinel values marking the squares as illegal. This trick accelerated move generation: for example, a bishop would generate moves by sliding one square at a time until it reached an illegal square, then stop. No need for complicated a priori computations to make sure that a move would not take a piece out of the memory area associated with the board. The second layer of fake squares is required by knight moves: for example, a knight on a corner square might try to jump out of bounds by two columns, so a single protection layer would be no protection at all!
•             MYCHESS reversed the process and represented the board in only 32 bytes, each of which was associated with a single piece (i.e., the white king, the black King's Knight's pawn, etc.) and contained the number of the square where that piece was located, or a sentinel value if the piece had been captured. This technique had a serious drawback: it was impossible to promote a pawn to a piece which had not already been captured. Later versions of the program fixed this problem.
Today, this type of super-miserly structure would only be useful (maybe) if you wrote a chess program for a Palm Pilot, a cell phone or a set-top box where 80-90% of resources are consumed by the operating system and non-game services. However, for some other types of games, there is really no alternative!

For more information on vintage chess programs, read David Welsh's book, Computer Chess, published in 1984.

Bit Boards

For many games, it is hard to imagine better representations than the simple one-square, one-slot array. However, for chess, checkers and other games played on a 64-square board, a clever trick was developed (apparently by the KAISSA team in the Soviet Union) in the late 60's: the bit board.

KAISSA ran on a mainframe equipped with a 64-bit processor. Now, 64 happens to be the number of squares on a chess board, so it was possible to use a single memory word to represent a yes-or-no or true-or-false predicate for the whole board. For example, one bitboard might contain the answer to "Is there a white piece here?" for each square of the board.

Therefore, the state of a chess game could be completely represented by 12 bitboards: one each for the presence of white pawns, white rooks, black pawns, etc. Adding two bitboards for "all white pieces" and "all black pieces" might accelerate further computations. You might also want to hold a database of bitboards representing the squares attacked by a certain piece on a certain square, etc.; these constants come in handy at move generation time.

The main justification for bit boards is that a lot of useful operations can be performed using the processor's instruction set's 1-cycle logical operators. For example, suppose you need to verify whether the white queen is checking the black king. With a simple square-array representation, you would need to:

•             Find the queen's position, which requires a linear search of the array and may take 64 load-test cycles.
•             Examine the squares to which it is able to move, in all eight directions, until you either find the king or run out of possible moves.
This process is always time-consuming, more so when the queen happens to be located near the end of the array, and even more so when there is no check to be found, which is almost always the case!

With a bitboard representation, you would:

•             Load the "white queen position" bitboard.
•             Use it to index the database of bitboards representing squares attacked by queens.
•             Logical-AND that bitboard with the one for "black king position".
If the result is non-zero, then the white queen is checking the black king. Assuming that the attack bitboard database is in cache memory, the entire operation has consumed 3-4 clock cycles!

Another example: if you need to generate the moves of the white knights currently on the board, just find the attack bitboards associated with the positions occupied by the knights and AND them with the logical complement of the bitboard representing "all squares occupied by white pieces" (i.e, apply the logical NOT operator to the bitboard), because the only restriction on knights is that they can not capture their own pieces!

For a (slightly) more detailed discussion of bitboards, see the article describing the CHESS 4.5 program developed at Northwestern University, in Peter Frey's book Chess Skill in Man and Machine; there are at least two editions of this book, published in 1977 and 1981.

Note: To this day, few personal computers use true 64-bit processors, so at least some of the speed advantages associated with bitboards are lost. Still, the technique is pervasive, and quite useful.

Transposition Tables

In chess, there are often many ways to reach the same position. For example, it doesn't matter whether you play 1. P-K4 ... 2. P-Q4 or 1. P-Q4... 2. P-K4; the game ends up in the same state. Achieving identical positions in different ways is called transposing.

Now, of course, if your program has just spent considerable effort searching and evaluating the position resulting from 1. P-K4 ... 2. P-Q4, it would be nice if it were able to remember the results and avoid repeating this tedious work for 1. P-Q4... 2. P-K4. This is why all chess programs, since at least Richard Greenblatt's Mac Hack VI in the late 1960's, have incorporated a transposition table.

A transposition table is a repository of past search results, usually implemented as a hash dictionary or similar structure to achieve maximum speed. When a position has been searched, the results (i.e., evaluation, depth of the search performed from this position, best move, etc.) are stored in the table. Then, when new positions have to be searched, we query the table first: if suitable results already exist for a specific position, we use them and bypass the search entirely.

There are numerous advantages to this process, including:

•             Speed. In situations where there are lots of possible transpositions (i.e., in the endgame, when there are few pieces on the board), the table quickly fills up with useful results and 90% or more of all positions generated will be found in it.
•             Free depth. Suppose you need to search a given position to a certain depth; say, four-ply (i.e., two moves for each player) ahead. If the transposition table already contains a six-ply result for this position, not only do you avoid the search, but you get more accurate results than you would have if you had been forced to do the work!
•             Versatility. Every chess program has an "opening book" of some sort, i.e., a list of well-known positions and best moves selected from the chess literature and fed to the program to prevent it from making a fool out of itself (and its programmer) at the very beginning of the game. Since the opening book's modus operandi is identical to the transposition table (i.e., look up the position, and spit out the results if there are any), why not initialize the table with the opening book's content at the beginning of the game? This way, if the flow of the game ever leaves the opening book and later translates back into a position that was in it, there is a chance that the transposition table will still contain the appropriate information and be able to use it.
The only real drawback of the transposition table mechanism is its voracity in terms of memory. To be of any use whatsoever, the table must contain several thousand entries; a million or more is even better. At 16 bytes or so per entry, this can become a problem in memory-starved environments.

Other uses of transposition tables

CHESS 4.5 also employed hash tables to store the results of then-expensive computations which rarely changed in value or alternated between a small number of possible choices:

•             Pawn structure. Indexed only on the positions of pawns, this table requires little storage, and since there are comparatively few possible pawn moves, it changes so rarely that 99% of positions result in hash table hits.
•             Material balance, i.e., the relative strength of the forces on the board, which only changes after a capture or a pawn promotion.
This may not be as useful in these days of plentiful CPU cycles, but the lesson is a valuable one: some measure of pre-processing can save a lot of computation at the cost of a little memory. Study your game carefully; there may be room for improvement here.

Generating Hash Keys for Chess Boards

The transposition tables described above are usually implemented as hash dictionaries of some sort, which brings up the following topic: how do you generate hashing keys from chess boards, quickly and efficiently?

The following scheme was described by Zobrist in 1970:

•             Generate 12x64 N-bit random numbers (where the transposition table has 2^N entries) and store them in a table. Each random number is associated with a given piece on a given square (i.e., black rook on H4, etc.) An empty square is represented by a null word.
•             Start with a null hash key.
•             Scan the board; when you encounter a piece, XOR its random number to the current hash key. Repeat until the entire board has been examined.
An interesting side effect of the scheme is that it will be very easy to update the hash value after a move, without re-scanning the entire board. Remember the old XOR-graphics? The way you XOR'ed a bitmap on top of a background to make it appear (usually in distorted colors), and XOR'ed it again to make it go away and restore the background to its original state? This works similarly. Say, for example, that a white rook on H1 captures a black pawn on H4. To update the hash key, XOR the "white rook on H1" random number once again (to "erase" its effects), the "black pawn on H4" (to destroy it) and the "white rook on H4" (to add a contribution from the new rook position).

Use the exact same method, with different random numbers, to generate a second key (or "hash lock") to store in the transposition table along with the truly useful information. This is used to detect and avoid collisions: if, by chance, two boards hash to the exact same key and collide in the transposition table, odds are extremely low that they will also hash to the same lock!

History Tables

The "history heuristic" is a descendant of the "killer move" technique. A thorough explanation belongs in the article on search; for now, it will be sufficient to say that a history table should be maintained to note which moves have had interesting results in the past (i.e., which ones have cut-off search quickly along a continuation) and should therefore be tried again at a later time. The history table is a simple 64x64 array of integer counters; when the search algorithm decides that a certain move has proven useful, it will ask the history table to increase its value. The values stored in the table will then be used to sort moves and make sure that more "historically powerful" ones will be tried first.

Pre-processing move generation

Move generation (i.e., deciding which moves are legal given a specific position) is, with position evaluation, the most computationally expensive part of chess programming. Therefore, a bit of pre-processing in this area can go a long way towards speeding up the entire game.

The scheme presented by Jean Goulet in his 1984 thesis Data Structures for Chess (McGill University) is a personal favorite. In a nutshell:

•             For move generation purposes, piece color is irrelevant except for pawns which move in opposite directions.
•             There are 64 x 5 = 320 combinations of major piece and square from which to move, 48 squares on which a black pawn can be located (they can never retreat to the back rank, and they get promoted as soon as they reach the eight rank), and 48 where a white pawn can be located.
•             Let us define a "ray" of moves as a sequence of moves by a piece, from a certain square, in the same direction. For example, all queen moves towards the "north" of the board from square H3 make up a ray.
•             For each piece on each square, there are a certain number of rays along which movement might be possible. For example, a king in the middle of the board may be able to move in 8 different directions, while a bishop trapped in a corner only has one ray of escape possible.
•             Prior to the game, compute a database of all rays for all pieces on all squares, assuming an empty board (i.e., movement is limited only by the edges and not by other pieces).
•             When you generate moves for a piece on a square, scan each of its rays until you either reach the end of the ray or hit a piece. If it is an enemy piece, this last move is a capture. If it is a friendly piece, this last move is impossible.
With a properly designed database, move generation is reduced to a simple, mostly linear lookup; it requires virtually no computation at all. And the entire thing holds within a few dozen kilobytes; mere chump change compared to the transposition table!

All of the techniques described above (bit boards, history, transposition table, pre-processed move database) will be illustrated in my own chess program, to be posted when I finish writing this series. Next month, I will examine move generation in more detail.

François Dominic Laramée, June 2000

Chess Programming Part III: Move Generation

Last month, I finished Part II of this series by introducing a data structure which pre-processes and stores most of the work related to chess move generation. This time, I will return to the topic in more detail, examining the two major move generation strategies and explaining how to choose between them for a given application.

The Dilemma

No matter how you slice it, chess is a complicated game, especially for a computer.

In any given situation, a player may have 30 or more legal moves to choose from, some good, some suicidal. For trained humans, it is easy to characterize the majority of these moves as foolish or pointless: even beginners learn that they had better come up with a solid plan before leaving their queen in a position where she can be captured, and masters know (more through instinctive pattern matching than by conscious effort) which 1-2 moves are likely to be the strongest in the position.

However, coding this information (especially the unconscious type!) into a computer has proven spectacularly difficult, and the strongest programs (except, to some extent, Hans Berliner's Hitech and its siblings) have given up on this approach, instead relying on "brute force": if you can analyze all possible moves fast enough and predict their consequences far enough down the road, it doesn't matter whether or not you start with a clear idea of what you are trying to accomplish, because you'll discover a good move eventually. Therefore, move generation and search should be made as fast as possible, so as to minimize the loss of effort required by the brute force method.

Search will be discussed in Parts IV and V of this series; this month, we will concentrate on move generation. Historically, three major strategies have been used in this area:

•             Selective generation: Examine the board, come up with a small number of "likely" moves and discard everything else.
•             Incremental generation: Generate a few moves, hoping that one will prove so good or so bad that search along the current line of play can be terminated before generating the others.
•             Complete generation: Generate all moves, hoping that the transposition table (discussed in Part II) will contain relevant information on one of them and that there will be no need to search anything at all.
Selective generation (and its associated search technique, called forward pruning) have all but disappeared since the mid 1970's. As for the other two, they represent two sides of the same coin, trading off effort in move generation vs search. In games where move generation is easy and/or there are lots of ways to transpose into the same positions (i.e., Othello and GoMoku), complete generation may be most efficient, while in games where move generation rules are complicated, incremental generation will usually work faster. Both strategies are sound and viable, however.

The Early Days: Forward Pruning

In 1949 (yes, really), Claude Shannon described two ways to build a chess-playing algorithm:

•             Look at all possible moves, and all the possible moves resulting from each, recursively.
•             Only examine the "best" moves, as determined from a detailed analysis of a position, and then only the "best" replies to each, recursively.
At first, the second alternative seemed more likely to succeed. After all, this is how human players do it, and it seems logical to assume that looking at only a few moves at each node will be faster than looking at all of them. Unfortunately, the results disproved the theory: programs using selective search just didn't play very well. At best, they achieved low to mid-level club player ratings, often committing humiliating blunders at the worst possible time. Beating a world champion (or even playing reasonably well on a consistent basis) was beyond their reach.

The problem is that, for a "best move generator" to be any good, it has to be almost perfect. Suppose that a program is equipped with a function that looks for the 5 best moves in a position, and that the objective best move is among those 5 at least 95% of the time. (That, by the way, is a big assumption.) This means that the probability that the generator's list will always contain the best choice at all times, during a 40-move game, is less than 13%. Even a god-like generator with 99% accuracy will blunder at least once in about a third of its games, while a more reasonable 90%-accurate function will play an entire game without messing up less than 1.5% of the time!

In the mid 1970's, the legendary Northwestern team of Slate and Atkin decided to do away with the complicated best-move generator; it turned out that the time they saved in avoiding costly analysis during move generation was enough to cover the added expense of a full-width search (i.e., examining all possible moves). To all intents and purposes, this discovery buried forward pruning for good.

Botvinnik's work

An extreme example of a forward pruning algorithm was developed in the Soviet Union, in the 1970's and early 1980's, under the tutelage of former World chess champion Mikhail Botvinnik. Botvinnik was convinced that the only way for a computer to ever play grandmaster-level chess was to play like a grandmaster, i.e., examine only a few moves, but in great depth and detail. His program seeked to identify and implement the sort of high-level plans and patterns which a world-class player might come up with during a game. While it led to some fascinating books, revealing insights into the master's mind which only Botvinnik could provide, this work unfortunately did not reach its lofty goals.

Generating All Moves Up-Front

Once forward pruning is eliminated from consideration, the most straightforward way to implement full-width searching consists of:

•             Finding all of the legal moves available in a position.
•             Ordering them in some way, hopefully speeding up search by picking an advantageous order.
•             Searching them all one at a time, until all moves have been examined or a cutoff occurs.
Early programs, for example Sargon, did this by scanning the board one square at a time, looking for pieces of the moving side, and computing possible move destinations on the fly. Memory being at a premium, the expenditure of CPU time required to re-compute these moves every time was a necessary evil.

These days, a pre-processed data structure like the one I described last month can avoid a considerable amount of computation and code complexity, at the cost of a few dozens of Kbytes. When this super-fast move generation is combined with transposition tables, an added bonus may fall into the programmer's lap: if even one of the moves has already been searched before, and if its evaluation (as retrieved from the table) is such that it triggers a cutoff, there will be no need to search any of the moves at all! Obviously, the larger the transposition table, and the higher the probability of a transposition given the rules of the game, the bigger the average payoff.

Not only is this technique conceptually simple, it is also the most "universal": while there are easy ways to segregate chess moves into different categories (i.e., captures and non-captures), other games like Othello do not provide such convenient tools to work with. Therefore, if you intend your program to play more than one game, you should probably pick this technique instead of the one described in the next section.

One Move At A Time

Sophisticated chess programs since at least CHESS 4.5 have adopted the opposite strategy: generate a few moves at a time, search them, and if a cutoff can be caused, there will be no need to generate the rest of the moves.

A combination of factors has made this technique popular:

•             Search does not require much memory. Programs of the 1970's had to make do with small transposition tables and could ill afford pre-processed move generation data structures, limiting the usefulness of the complete generation scheme described above.
•             Move generation is more complicated in chess than in most other games, with castling, en passant pawn captures, and different rules for each piece type to contend with.
•             Very often, a refutation (i.e., a move that will cause a cutoff) is a capture. For example, if Player A leaves a queen "en prise", the opponent will quickly grab it and wreck A's game. Since there are usually few possible captures in a position, and since generating captures separately is relatively easy, computing captures is often enough to trigger a cutoff at little expense.
•             Captures are usually one of the few (if not the only) type of move examined during quiescence search (which will be discussed in a later article), so generating them separately is doubly useful.
Therefore, many programs will generate captures first, often starting with those of highly valuable pieces, and look for a quick cutoff. A number of techniques have been developed to speed up piece-meal move generation, most of them involving bitboards:

•             CHESS 4.5 maintains two sets of 64 bitboards, with one bitboard per square on the board. One contains the squares attacked by the piece, if any, located on that square; the other is the transpose, containing all squares occupied by pieces attacking that square. Therefore, if the program is looking for moves that would capture Black's queen, it looks for its position in the basic bitboard database, uses these new data structures to identify the pieces which attack the queen's position, and only generates moves for these pieces.
•             Maintaining these "attack bitboards" after each move requires some rather involved technical wizardry, but a tool called a rotated bitboard can accelerate the work significantly. The best discussion of rotated bitboards I have seen was written by James F. Swafford, and can be found on the web at

Ordering Moves To Speed Up Search

As we will see next time, search efficiency depends on the order in which moves are searched. The gains and losses related to good or poor move ordering are not trivial: a good ordering, defined as one which will cause a large number of cutoffs, will result in a search tree about the square root of the size of the tree associated with the worst possible ordering!

Unfortunately, it turns out that the best possible ordering is simply defined by trying the best move first. And of course, if you knew which moves are best, you wouldn't be searching in the first place. Still, there are ways to "guess" which moves are more likely to be good than others. For example, you might start with captures, pawn promotions (which dramatically change material balance on the board), or checks (which often allow few legal responses); follow with moves which caused recent cutoffs at the same depth in the tree (so-called "killer moves"), and then look at the rest. This is the justification for iterative deepening alphabeta, which we will discuss in detail next month, as well as the history table we talked about last time. Note that these techniques do not constitute forward pruning: all moves will be examined eventually; those which appear bad are only delayed, not eliminated from consideration.

A final note: in chess, some moves may be illegal because they leave the King in check. However, such an occurrence is quite rare, and it turns out that validating moves during generation would cost a tremendous amount of effort. It is more efficient to delay the check until the move is actually searched: for example, if capturing the King would be a valid reply to Move X, then Move X is illegal and search should be terminated. Of course, if search is cutoff before the move has to be examined, validation never has to take place.

My Choice

For my chess program, I have chosen to generate all moves at the same time. These are only some of the reasons why:

•             I intend to use the program as a basis for several other games, most of which have no direct counterparts to chess captures.
•             I have plenty of memory to play with.
•             The code required to implement this technique is simpler to write and to understand; you will thank me when you see it.
•             There are several freeware programs that implement piece-meal move generation; the curious reader should look at Crafty, for example, as well as James Swafford's Galahad.
While overall performance may be slightly less stellar than otherwise, my program (written in Java, no less) wouldn't exactly provide a challenge to Deep Blue even in the best case, so I won't feel too bad!

Next Month

Now, we are ready to delve into the brains of a chess-playing program, with search techniques. This is such a large topic that it will require two articles. We will begin with the basic search algorithms common to all games, before continuing with new developments and chess-specific optimizations in the next installment.

François Dominic Laramée, July 2000

Chess Programming Part IV: Basic Search
By François Dominic Laramée | Published Aug 06 2000 04:25 PM in Artificial Intelligence

Fourth article in this complicated, code-deprived, dry-as-Metamucil series, and you're still reading? Drop me an email if you are, so that I know I'm writing these for a reason!

Anyway, this month's installment focuses on the basics of two-agent search in strategy games: why it is useful, how to do it, and what it implies for the computer's style of play. The techniques I will discuss apply equally well to most two-player games, but by themselves, they are not quite sufficient to play good chess; next month, I will describe advanced techniques which significantly increase playing strength and search efficiency, usually at the same time.
Why Search?

Well, basically, because we are not smart enough to do without it.

A really bright program might be able to look at a board position and determine who is ahead, by how much, and what sort of plan should be implemented to drive the advantage to fruition. Unfortunately, there are so many patterns to discern, so many rules and so many exceptions, that even the cleverest programs just aren't very good at this sort of thing. What they are good at, however, is computing fast. Therefore, instead of trying to figure out good moves just by looking at a board, chess programs use their brute force to do it: look at every move, then at every possible countermove by the opponent, etc., until the processor melts down.

Deep searches are an easy way to "teach" the machine about relatively complicated tactics. For example, consider the knight fork, a move which places a knight on a square from which it can attack two different pieces (say, a rook and the queen). Finding a way to represent this type of position logically would require some effort, more so if we also had to determine whether the knight was itself protected from capture. However, a plain dumb 3-ply search will "learn" the value of a fork on its own: it will eventually try to move the knight to the forking square, will test all replies to this attack, and then capture one of the undefended pieces, changing the board's material balance. And since a full-width search looks at everything, it will never miss an opportunity: if there is a 5-move combination, however obscure, that leads to checkmate or to a queen capture, the machine will see it if its search is deep enough. Therefore, the deeper the search, the more complicated the "plans" which the machine can stumble upon.
Grandpa MiniMax

The basic idea underlying all two-agent search algorithms is Minimax. It dates back from the Dark Ages; I believe Von Neumann himself first described it over 60 years ago.

Minimax can be defined as follows:

•             Assume that there is a way to evaluate a board position so that we know whether Player 1 (whom we will call Max) is going to win, whether his opponent (Min) will, or whether the position will lead to a draw. This evaluation takes the form of a number: a positive number indicates that Max is leading, a negative number, that Min is ahead, and a zero, that nobody has acquired an advantage.
•             Max's job is to make moves which will increase the board's evaluation (i.e., he will try to maximize the evaluation).
•             Min's job is to make moves which decrease the board's evaluation (i.e., he will try to minimize it).
•             Assume that both players play flawlessly, i.e., that they never make any mistakes and always make the moves that improve their respective positions the most.
How does this work? Well, suppose that there is a simple game which consists of exactly one move for each player, and that each has only two possible choices to make in a given situation. The evaluation function is only run on the final board positions, which result from a combination of moves by Min and Max.


Max assumes that Min will always play perfectly. Therefore, he knows that, if he makes move A, his opponent will reply with D, resulting in a final evaluation of -2 (i.e., a win for Min). However, if Max plays B, he is sure to win, because Min's best move still results in a positive final value of 5. So, by the Minimax algorithm, Max will always choose to play B, even though he would score a bigger victory if he played A and Min made a mistake!

The trouble with Minimax, which may not be immediately obvious from such a small example, is that there is an exponential number of possible paths which must be examined. This means that effort grows dramatically with:

•             The number of possible moves by each player, called the branching factor and noted B.
•             The depth of the look-ahead, noted d, and usually described as "N-ply", where N is an integer number and "ply" means one move by one player. For example, the mini-game described above is searched to a depth of 2-ply, one move per player.
In Chess, for example, a typical branching factor in the middle game would be about 35 moves; in Othello, around 8. Since Minimax' complexity is O( B^n ), an 8-ply search of a chess position would need to explore about 1.5 million possible paths! That is a LOT of work. Adding a ninth ply would make the tree balloon to about 50 million nodes, and a tenth, to an impossible 1.8 billion!

Luckily, there are ways to cut the effort by a wide margin without sacrificing accuracy.
Alphabeta: Making Minimax Feasible (a little)

Suppose that you have already searched Max' move B in the mini-game above. Therefore, you know that, at best, Max' score for the entire game will be 5.

Now, suppose that you begin searching move A, and that you start with the path A-D. This path results in a score of -2. For Max, this is terrible: if he plays A, he is sure to finish with, at best, -2, because Min plays perfectly; if A-C results in a score higher than A-D's, Min will play A-D, and if A-C should be even worse (say, -20), Min would take that path instead. Therefore, there is no need to look at A-C, or at any other path resulting from move A: Max must play B, because the search has already proven that A will end up being a worse choice no matter what happens.

This is the basic idea being the alphabeta algorithm: once you have a good move, you can quickly eliminate alternatives that lead to disaster. And there are a lot of those! When combined with the transposition table we discussed earlier in the series, and which saves us from examining positions twice if they can be reached by different combinations of moves, alphabeta turns on the Warp drive: in the best case, it will only need to examine roughly twice the square root of the number of nodes searched by pure Minimax, which is about 2,500 instead of 1.5 million in the example above.
Ordering Moves to Optimize Alphabeta

But how can we achieve this best case scenario? Do we even need to?

Not really. It turns out that Alphabeta is always very efficient at pruning the search tree, as long as it can quickly find a pretty good move to compare others to. This means that it is important to search a good move first; the best case happens when we always look at the best possible moves before any others. In the worst possible case, however, the moves are searched in increasing order of value, so that each one is always better than anything examined before; in this situation, alphabeta can't prune anything and the search degenerates into pure, wasteful Minimax.

Ordering the moves before search is therefore very important. Picking moves at random just won't do; we need a "smarter" way to do the job. Unfortunately, if there was an easy way to know what the best move would turn out to be, there would be no need to search in the first place! So we have to make do with a "best guess".

Several techniques have been developed to order the possible moves in as close to an optimal sequence as possible:

•             Apply the evaluation function to the positions resulting from the moves, and sort them. Intuitively, this makes sense, and the better the evaluation function, the more effective this method should be. Unfortunately, it doesn't work well at all for chess, because as we will see next month, many positions just can't be evaluated accurately!
•             Try to find a move which results in a position already stored in the transposition table; if its value is good enough to cause a cutoff, no search effort needs to be expended.
•             Try certain types of moves first. For example, having your queen captured is rarely a smart idea, so checking for captures first may reveal "bonehead" moves rapidly.
•             Extend this idea to any move which has recently caused a cutoff at the same level in the search tree. This "killer heuristic" is based on the fact that many moves are inconsequential: if your queen is en prise, it doesn't matter whether you advance your pawn at H2 by one or two squares; the opponent will still take the queen. Therefore, if the move "bishop takes queen" has caused a cutoff during the examination of move H2-H3, it might also cause one during the examination of H2-H4, and should be tried first.
•             Extend the killer heuristic into a history table. If, during the course of the game's recent development, moving a piece from G2 to E4 has proven effective, then it is likely that doing so now would still be useful (even if the old piece was a bishop and has been replaced by a queen), because conditions elsewhere on the board probably haven't changed that much. The history heuristic is laughably simple to implement, needing only a pair of 64x64 arrays of integer counters, and yields very interesting results.
Having said all that about "reasonable ideas", it turns out that the most effective method is one which goes against every single bit of human intuition: iterative deepening.
Iterative Deepening AlphaBeta

If you are searching a position to depth 6, the ideal move ordering would be the one yielded by a prior search of the same position to the same depth. Since that is obviously impossible, how about using the results of a shallower search, say of depth 5?

This is the idea behind iterative deepening: begin by searching all moves arising from the position to depth 2, use the scores to reorder the moves, search again to depth 3, reorder, etc., until you have reached the appropriate depth.

This technique flies in the face of common sense: a tremendous amount of effort is duplicated, possibly 8-10 times or more. Or is it?

Consider the size of a search tree of depth d with branching factor B. The tree has B nodes at depth 1, B*B at depth 2, B*B*B at depth 3, etc. Therefore, searching to depth (d-1) yields a tree B times smaller than searching to depth d! If B is large (and remember that it is about 35 during the middle game in chess), the overwhelming majority of the effort expended during search is devoted to the very last ply. Duplicating a search to depth (d-1) is a trifling matter: in fact, even if it yielded no advantages whatsoever, iterative deepening would only cost less than 4% extra effort on a typical chess position!

However, the advantages are there, and they are enormous: using the results of a shallower search to order the moves prior to a deeper one produces a spectacular increase in the cutoff rate. Therefore, IDAB actually examines far fewer nodes, on average, than a straight AB search to the same depth using any other technique for move ordering! When a transposition table enters the equation, the gain is even more impressive: the extra work performed to duplicate the shallow parts of the search drops to nothing because the results are already stored in the table and need not be computed again.
Computer Playing Style

Iterative deepening alphabeta combined with a transposition table (and a history table to kickstart the effort) allows the computer to search every position relatively deeply and to play a reasonable game of chess. That being said, its Minimax ancestor imposes a very definite playing style on the computer, one which is not exactly the most spectacular in the world.

For example, suppose that the machine searches a position to depth 8. While looking at a certain move, it sees that every possible response by the opponent would let it win the game in dazzling manner... Except for a single opaque, difficult, obfuscated and almost maddeningly counter-intuitive sequence, which (if followed to perfection) would allow the opponent to salvage a small chance of eventually winning the game. Against a human player (even a Grandmaster), such a trap might turn the game into one for the history books.

However, if the machine then finds a boring move which always forces a draw, it will immediately discard the trap, because it assumes that the opponent would find the perfect counter, no matter how improbable that is, and that the draw is the best it can hope for!

As a result, you might say that machines play an overly cautious game, as if every opponent was a potential world champion. Combined with the fact that computers often can't search deep enough to detect the traps which human players devise against them, this allows very skilled humans to "waste time" and confuse the machine into making a blunder which they can exploit. (Human players also study their opponent's styles for weaknesses; if Kasparov had been given access to, say, a hundred games played by Deep Blue before their match, he might have been able to find the flaws in its game and beat it. But we'll never know for sure.)
Next Month

In Part V, we will discuss the limitations of straight, fixed-depth alphabeta search, and how to improve playing strength using techniques like the null-move heuristic, quiescence search, aspiration search and MTD(f), and the "singular extensions" which made Deep Blue famous. Hold on, we're almost done!

François Dominic Laramée, August 2000

Chess Programming Part V: Advanced Search
By François Dominic Laramée | Published Sep 05 2000 06:09 PM in Artificial Intelligence

Hey, it looks like there are dozens (and dozens) of you reading this series! I'm tickled pink!

In this next-to-last article, we will examine advanced search-related techniques which can speed up and/or strengthen your chess-playing program. In most cases, the concepts (if not the actual code) also apply to a variety of other 2-player games; adapting them to your particular problem, however, shall remain an exercise for the proverbial reader.
Why Bother?

So far, all of the search algorithms we have looked at examine a position's consequences to a fixed "depth". However, this is rarely a good thing. For example, suppose that your program uses an iterative-deepening alpha-beta algorithm with maximum depth 5-ply. Now look at these cases:

•             Along a certain line of play, you discover a position where one of the players is checkmated or stalemated at depth 3. Obviously, you don't want to keep searching, because the final state of the game has been resolved. Not only would searching to depth 5 be a colossal waste of effort, it may also allow the machine to finagle its way into an illegal solution!
•             Now, suppose that, at depth 5, you capture a pawn. The program would be likely to score this position in a favorable light, and your program might decide that the continuation leading to it is a useful one. However, if you had looked one ply further, you might have discovered that capturing the pawn left your queen undefended. Oops.
•             Finally, suppose that your queen is trapped. No matter what you do, she will be captured by the opponent at ply 4, except for one specific case where her death will happen at ply 6. If you search to depth 5, the continuations where the queen is captured at ply 4 will be examined accurately, and scored as likely disasters. However, the unique line of play where the queen is only captured at ply 6 (outside of the search tree) doesn't reveal the capture to the machine, which thinks that the queen is safe and gives it a much better score! Now, if all you have to do to push the queen capture outside of the search tree is delay the opponent with a diversion, doing so may be worth the risk: although it could damage your position, it might also cause the opponent to make a mistake and allow the queen to escape. But what if you can only delay the queen capture by sacrificing a rook? To the machine, losing a rook at ply 4 is less damaging than losing a queen, so it will merrily throw its good piece away and "hide" the too-horrible-to-mention queen capture beyond its search horizon. (During its next turn, of course, the machine will discover that the queen must now be captured at ply 4 in all cases, and that it has wasted a rook for no gain.) Hans Berliner described this "horizon effect" a long time ago, and it is the most effective justification for the "quiescence search" described in the next section.
The bottom line is this: a great many positions in chess (and in other games as well) are just too chaotic to be evaluated properly. An evaluation function can only be applied effectively to "quiet" positions where not much of importance is likely to happen in the immediate future. How to identify these is our next topic.
Quiet, Please!

There are two ways to assess a position's value: dynamic evaluation (i.e., look at what it may lead to) and static evaluation (i.e., see what it looks like on its own, irrespective of consequences). Dynamic evaluation is performed through search; as we have just mentioned, static evaluation is only feasible when the position is not likely to undergo an overwhelming change of balance in the near future. Such relatively stable positions are called "quiet" or "quiescent", and they are identified via "quiescence search".

The basic concept of Quiescence Search is the following: once the program has searched everything to a fixed depth (say, 6-ply), we continue each line of play selectively, by searching "non-quiescent" moves only, until we find a quiescent position, and only then apply the evaluator.

Finding a quiet position requires some knowledge about the game. For example, which moves are likely to cause a drastic change in the balance of power on the board? For chess, material balance tends to be the overwhelming consideration in the evaluator, so anything that changes material is fair game: captures (especially those of major pieces) and pawn promotions certainly qualify, while checks may also be worth a look (just in case they might lead to checkmate). In checkers, captures and promotions also seem like reasonable choices. In Othello, every single move is a capture, and "material balance" can change so much in so little time that it might be argued that there are no quiet positions at all!

My own program uses a simple quiescence search which extends all lines of play (after a full-width search to depth X) by looking exclusively at captures. Since there are usually not that many legal captures in a given position, the branching factor in the quiescence search tends to be small (4-6 on average, and quickly converging to 0 as pieces are eaten on both sides). Nevertheless, the quiescence search algorithm is called on a LOT of positions, and so it tends to swallow 50% or more of the entire processing time. Make sure that you need such a scheme in your own game before committing to it.

Only when no capture is possible does my program apply its evaluator. The result is a selectively-extended search tree which is anything but fixed-depth, and which defeats most of the nasty consequences of the "horizon effect".
The All-Important Null-Move

One of the most effective ways to speed up a chess program is to introduce the concept of a null move into the equation.

The null move consists, quite simply, of skipping a turn and letting the opponent play two moves in a row. In the overwhelming majority of positions, doing nothing is a bone-head idea: you should (almost) always be able to do *something* to improve your lot. (To be honest, there are a few "damned if I do, damned if I don't" positions where the null move would actually be your best bet, and the computer will not play them correctly, but such "zugzwang" positions are hopeless anyway, so the loss of performance is not very traumatic.)

Allowing the computer to try a null move during search has several advantages related to speed and accuracy. For example:

•             Suppose that a position is so overwhelmingly in your favor that, even if you skipped your turn, the opponent couldn't respond with anything that would help. (In program terms, you would get a beta cutoff even without making a move.) Suppose further that this position is scheduled to be searched to depth N. The null move, in effect, takes out an entire ply of the search tree (you are searching only the null move instead of all your legal ones) and if your branching factor is B, searching the null move is equivalent to looking at a single depth N-1 subtree instead of B of them. With B=35 as in the typical chess middlegame, null-move search may only consume 3% of the resources required by a full depth-N examination. If the null move search reveals that you are still too strong even without playing (i.e., it creates a cutoff), you have saved 97% of your effort; if not, you must examine your own legal moves as usual, and have only wasted an extra 3%. On average, the gain is enormous.
•             Now, suppose that, during quiescence search, you reach a position where your only legal capture is rook-takes-pawn, which is immediately followed by the opponent's knight-takes-rook. You'd be a lot better off not making the capture, and playing any other non-capture move, right? You can simulate this situation by inserting the null move into the quiescence search: if, in a given position during quiescence search, it is revealed that the null move is better than any capture, you can assume that continuing with captures from this position is a bad idea, and that since the best move is a quiet one, this is a position where the evaluation function itself should be applied!
Overall, the null-move heuristic can save between 20% and 75% of the effort required by a given search. Well worth the effort, especially when you consider that adding it to a program is a simple matter of changing the "side to play" flag and adding less than a dozen lines of code in the quiescence search algorithm!
Aspirated Search and MTD(f)

Plain old alphabeta assumes nothing about a position's ultimate minimax value. It looks at *everything*, no matter how preposterous. However, if you have a pretty good idea of what the value will turn out to be (for example, because you are running an iterative-deepening scheme and have received the previous iteration's results), you might be able to identify lines of play that are so out of whack with your expectations that they can't possibly be right, and cut them off pre-emptively.

For example, suppose that you have reason to believe that a position's value will be close to 0, because it is very well balanced. Now, suppose that an internal node's preliminary evaluation is at +20,000. You can cutoff with reasonable confidence.

This is the idea behind "aspiration search", a variant of alphabeta in which, instead of using +INFINITY and -INFINITY as the initial bounds of the search, you set a small window around the expected value instead. If the actual value happens to fall within the window, you win: you'll get it without error, and faster than you would otherwise (because of the many extra cutoffs). If not, the algorithm will fail, but the error will be easy to detect (because the minimax value you'll receive will be equal to one of the bounds); you'll have to waste a bit of time re-searching with a wider window. If the former case happens more often than the latter, you win on average. Obviously, the better your initial guess of the expected value, the more useful this technique is.

In the mid 1990's, researcher Aske Plaat extended aspiration search to its logical conclusion: what if you called an aspirated alphabeta with a search window of width equal to zero? It would fail all the time, of course... But it would do so *very quickly*, because it would cutoff every path almost immediately. Now, if the failure indicates that the actual value is lower than your estimate, you can try again, with another zero-width window around a smaller estimate, etc. In a sense, you could then use alphabeta to perform a binary search into the space of all possible minimax values, until you reach the only call which will *not* fail because the zero-width window will be centered on the position's actual value!

This brilliant idea, presented in a paper available on the web at http://theory.lcs.mi...plaat/mtdf.html, has been embodied in the MTD(f) search algorithm, which is all of 10 lines long. Tacked on top of an alphabeta implementation equipped with a transposition table, MTD(f) is incredibly efficient and highly parallel-friendly. It also works better with "coarse-grain" (and therefore probably simpler and faster) evaluators: it is easy to see that it takes fewer probes to zero in on the actual value in a binary search if the smallest "atom" of value is equal to, say, 0.1 pawns rather than 0.001 pawns.

There are other alphabeta variants in wider use (namely, the infamous NegaScout; I would rather teach General Relativity to orangutangs than get into that mess) but Plaat insists that MTD(f) is the most efficient algorithm in existence today and I'll take his word for it. My own program uses MTD(f); you'll be able to marvel at the algorithm's simplicity very shortly!
Singular Extensions

One last thing before we leave the topic of search: in chess, some moves are obviously better than others, and it may not be necessary to waste too much time searching for alternatives.

For example, suppose that after running your iterative algorithm to depth N-1, you discover that one of your moves is worth +9000 (i.e., a capture of the opponent's queen) and all others are below 0. If saving time is a consideration, like in tournaments, you may want to bypass the whole depth N search and only look at the best move to depth N instead: if this extra ply does not lower its evaluation much, then you assume that the other moves won't be able to catch up, and you stop searching early. (Remember: if there are 35 valid moves at each ply on average, you may have just saved 97% of your total effort!)

Deep Blue's team has pushed this idea one step further and implemented the concept of "singular extensions". If, at some point in the search, a move seems to be a lot better than all of the alternatives, it will be searched an extra ply just to make sure that there are no hidden traps there. (This is a vast oversimplification of the whole process, of course, but that's the basic idea.) Singular extensions are costly: adding an extra ply to a node roughly doubles the number of leaves in the tree, causing a commensurate increase in the number of calls to the evaluator; in other words, Deep Blue's specialized hardware can afford it, my cranky Java code can't. But it's hard to argue with the results, isn't it?
Next Month

In Part VI, we wrap up the series with a discussion of evaluation functions, the code which actually tells your program whether a given board position is good or bad. This is an immense topic, and people can (and do) spend years refining their own evaluators, so we will have to content ourselves with a rather high-level discussion of the types of features which should be examined and their relative importance. If everything goes according to plan, I should also have some Java code for you to sink your teeth into at about that time, so stick around, won't you?

François Dominic Laramée, September 2000

Chess Programming Part VI: Evaluation Functions
By François Dominic Laramée | Published Oct 08 2000 04:07 PM in Artificial Intelligence
  Download attached article resource

It's been six months, and I know sometimes it must have felt like I would never shut up, but here we are: the sixth and last installment of my chess programming series. Better yet: my Java chess program (see attached resource file), primitive though it may be, is now complete, and I shipped it to Gamedev along with this, which proves that I know (a little bit of) what I've been talking about.

This month's topic, the evaluation function, is unique in a very real sense: while search techniques are pretty much universal and move generation can be deducted from a game's rules and no more, evaluation requires a deep and thorough analysis of strategy. As you can well imagine, it is impossible to assess a position's relative strengths if we don't know what features are likely to lead to victory for one player or the other. Therefore, many of the concepts discussed here may apply to other games in very different fashion, or not at all; it is your job as programmer to know your game and decide what the evaluator should take into consideration.

Without further delay, let us take a look at some board evaluation metrics and at how they can be used to evaluate a chess position.
Material Balance

To put it simply, material balance is an account of which pieces are on the board for each side. According to chess literature, a queen may be worth 900 points, a rook 500, a bishop 325, a knight 300 and a pawn 100; the king has infinite value. Computing material balance is therefore a simple matter: a side's material value is equal to
MB = Sum( Np * Vp )

where Np is the number of pieces of a certain type on the board and Vp is that piece's value. If you have more material on the board than your opponent, you are in good shape.

Sounds simple, doesn't it? Yet, it is by far the overwhelming factor in any chess board evaluation function. CHESS 4.5's creators estimate that an enormous advantage in position, mobility and safety is worth less than 1.5 pawns. In fact, it is quite possible to play decent chess without considering anything else!

Sure, there are positions where you should sacrifice a piece (sometimes even your queen) in exchange for an advantage in momentum. These, however, are best discovered through search: if a queen sacrifice leads to mate in 3 moves, your search function will find the mate (provided it looks deep enough) without requiring special code. Think of the nightmares if you were forced to write special-case code in your evaluator to determine when a sacrifice is worth the trouble!

Few programs use a material evaluation function as primitive as the one I indicated earlier. Since the computation is so easy, it is tempting to add a few more features into it, and most people do it all the time. For example, it is a well-known fact that once you are ahead on material, exchanging pieces of equal value is advantageous. Exchanging a single pawn is often a good idea, because it opens up the board for your rooks, but you would still like to keep most of your pawns on the board until the endgame to provide defense and an opportunity for queening. Finally, you don't want your program to panic if it plays a gambit (i.e., sacrifices a pawn) from its opening book, and therefore you may want to build a "contempt factor" into the material balance evaluation; this allows your program to think it's ahead even though it is behind by 150 points of material or more, for example.

Note that, while material balance is highly valuable in chess and in checkers, it is deceiving in Othello. Sure, you must control more squares than the opponent at the end of the game to win, but it is often better to limit his options by having as few pieces on the board as possible during the middlegame. And in other games, like Go-Moku and all other Connect-N variations, material balance is irrelevant because no pieces are ever captured.
Mobility and Board Control

One of the characteristics of checkmate is that the victim has no legal moves left. Intuitively, it also seems better to have a lot of options available: a player is more likely to be able to find a good line of play if he has 30 legal moves to choose from than if he is limited to 3.

In chess, mobility is easily assessed: count the number of legal moves for each side given the position, and you are done. However, this statistic has proven to be of little value. Why? For one thing, many chess moves are pointless. It may be legal to make your rook patrol the back rank one square at a time, but it is rarely productive. For another, trying to limit the opponent's mobility at all costs may lead the program to destroy its own defensive position in search of "pointless checks": since there are rarely more than 3-4 legal ways to evade check in any given position, a mobility-oriented program would be likely to make incautious moves to put the opponent in check, and after a while, it may discover that it has accomplished nothing and has dispersed its forces all over the board.

Still, a few sophisticated mobility evaluation features are always a good thing. My program penalizes "bad bishops", i.e., bishops whose movement is hampered by a large number of pawns on squares of the same color, as well as knights sitting too close to the edges of the board. As another example, rooks are more valuable on open and semi-open files, i.e., files where there are no pawns (or at least no friendly ones).

A close relative of mobility is board control. In chess, a side controls a square if it has more pieces attacking it than the opponent. It is usually safe to move to a controlled square, and hazardous to move to one controlled by the opponent. (There are exceptions: moving your queen to a square attacked by an enemy pawn is rarely a good idea, no matter how many ways you can capture that pawn once it has done its damage. You may also voluntarily put a piece in harm's way to lead a defender away from an even more valuable spot.) In chess, control of the center is a fundamental goal of the opening. However, control is somewhat difficult to compute: it requires maintaining a database of all squares attacked by all pieces on the board at any given time. Many programs do it; mine doesn't.

While mobility is less important than it seems to the chess programmer, it is extremely important in Othello (where the side with the fewest legal moves in the endgame is usually in deep trouble). As for board control, it is the victory condition of Go.

An age-old maxim of chess playing is that minor pieces (bishops and knights) should be brought into the battle as quickly as possible, that the King should castle early and that rooks and queens should stay quiet until it is time for a decisive attack. There are many reasons for this: knights and bishops (and pawns) can take control of the center, support the queen's attacks, and moving them out of the way frees the back rank for the more potent rooks. Later on in the game, a rook running amok on the seventh rank (i.e., the base of operations for the opponent's pawns) can cause a tremendous amount of damage.

My program uses several factors to measure development. First, it penalizes any position in which the King's and Queen's pawns have not moved at all. It also penalizes knights and bishops located on the back rank where they hinder rook movement, tries to prevent the queen from moving until all other pieces have, and gives a big bonus to positions where the king has safely castled (and smaller bonuses to cases where it hasn't castled yet but hasn't lost the right to do so) when the opponent has a queen on the board. As you can see, the development factor is important in the opening but quickly loses much of its relevance; after 10 moves or so, just about everything that can be measured here has already happened.

Note that favoring development can be dangerous in a game like Checkers. In fact, the first player to vacate one of the squares on his back rank is usually in trouble; avoiding development of these important defensive pieces is a very good idea.
Pawn Formations

Chess grandmasters often say that pawns are the soul of the game. While this is far from obvious to the neophyte, the fact that great players often resign over the loss of a single pawn clearly indicates that they mean it!

Chess literature mentions several types of pawn features, some valuable, some negative. My program looks at the following:

•             Doubled or tripled pawns. Two or more pawns of the same color on the same file are usually bad, because they hinder each others movement.
•             Pawn rams. Two opposing pawns "butting heads" and blocking each others forward movement constitute an annoying obstacle.
•             Passed pawns. Pawns which have advanced so far that they can no longer be attacked or rammed by enemy pawns are very strong, because they threaten to reach the back rank and achieve promotion.
•             Isolated pawns. A pawn which has no friendly pawns on either side is vulnerable to attack and should seek protection.
•             Eight pawns. Having too many pawns on the board restricts mobility; opening at least one file for rook movement is a good idea.
A final note on pawn formations: a passed pawn is extremely dangerous if it has a rook standing behind it, because any piece that would capture the pawn is dead meat. My program therefore scores a passed pawn as even more valuable if there is a rook on the same file and behind the pawn.
King Safety and Tropism

We have already touched on king safety earlier: in the opening and middle game, protecting the king is paramount, and castling is the best way to do it.

However, in the endgame, most of the pieces on both sides are gone, and the king suddenly becomes one of your most effective offensive assets! Leaving him behind a wall of pawns is a waste of resources.

As for "tropism", it is a measure of how easy it is for a piece to attack the opposing king, and is usually measured in terms of distance. The exact rules used to compute tropism vary by piece type, but they all amount to this: the closer you are to the opposing king, the more pressure you put on it.
Picking the Right Weights

Now that we have identified the features we would like to measure, how do we assign relative weights to each?

That, my friends, is a million-dollar question. People can (and do) spend years fiddling with the linear combination of features in their evaluation functions, sometimes giving a little more importance to mobility, sometimes de-emphasizing safety, etc. I wish there were an absolute solution, but there isn't. This is still very much a matter for trial and error. If your program plays well enough, great. If not, try something else, and play it against the old version; if it wins most of the time, your new function is an improvement.

Three things you may want to keep in mind:

•             It is very difficult to refine an evaluation function enough to gain as much performance as you would from an extra ply of search. When in doubt, keep the evaluator simple and leave as much processing power as possible to alphabeta.
•             Unless you are after the World Championship, your evaluator does not need to be all-powerful!
•             If your game plays really fast, you may want to try evolving an appropriate evaluator using a genetic algorithm. For chess, though, this would likely require thousands of years!

Next Month

Well, there ain't no next month. This is it.

If I wanted to drag this series even longer, I could write about opening books, endgame libraries, specialized chess hardware and a zillion other things. I could, I could. But I won't. Some of these topics I reserve for the book chapter I will be writing on this very topic later this Fall. Others I just don't know enough about to contribute anything useful. And mostly, I'm just too lazy to bother.

Still, I hope you enjoyed reading this stuff, and that you learned a useful thing or two or three. If you did, look me up next year at the GDC or at E3, and praise me to whomever grants freelance design and programming contracts in your company, will ya?


François Dominic Laramée, October 2000

Watch and follow along as the process of writing a chess engine is demonstrated and explained.
There are currently two tutorial series:
Write a simple Java chess engine with GUI in under 1,000 lines of code
Write an advanced bitboard-based Java chess engine using modern techniques.

Subscribe to get email notifications on upcoming chess engine tutorial videos. (A new video is posted every Monday)


Chess programing