Kalah
RULES
1. the players move alternately, assume the opening player is a girl, why not?
2. each player chooses a nonempty pit and empties it by distributing
one pebble into each pit including her kalah moving in an anticlockwise
direction, noting that if the player holds sufficient pebbles pebbles
are placed in the opponent's pits but never into the opponent's kalah
3 if the last pebble lands in her kalah then the player has another turn and this process can continue many times
4 if the last pebble lands in an empty pit on her side opposite a nonempty
pit on the opponent's side then the move comes to an end but all the
pebbles in these two pits are placed in her kalah
5 if either player finds that all their pits are empty then the other player can place all the pebbles in their pit into their kalah
6 the winner is the player with most pebbles in their kalah.
CHOICE OF A PROBLEM
1. The activity must not be deterministic in the practical sense. There
exists no human algorithm to guarantee a win or draw and the complete
game involves a large number of choices.
2. A definite goal must exist and at least one criterion or intermediate
goal must exist which has a bearing on the final goal e.g. to stop the
opponent moving.
3. The rules of the activity must be definite and known. Games satisfy this criterion.
4. There should be a background of knowledge concerning the activity against which the learning process can be tested.
5. The activity should be familiar to a substantial body of people so that the behaviour of the program can be made understandable to them.
Action of a General Computer Program in Game Playing
a. The human inserts an arbitrary position in the game. Goto b or e according to whether it is human's or program's move.
b. The human inserts his move.
c. Generate the next position.
d. If the game is over print the result and stop otherwise goto e.
e. Generate all or some of the successors of the top computer position.
f. Evaluate each position. For some programs each evaluation involves elaborate searching with look-ahead.
g. Move to the successor.
h. Print the Computer's move.
j. If the game is over print the result stop otherwise gotob
LEARNING
One of the simplest ways for a program to learn is to save all the information
it has gained or deduced about its previous games. If the
program can refer to this information at any stage a great deal of
computing time can be saved. Playing at ply n it is possible to achieve
look-ahead at a level considerably greater . The program saves the
board positions that have actually occurred in playing the game with their
associated scores. When a terminal node is reached at look-ahead level
n its actual value is found in a look-up table giving an effective
look-ahead of many times n.
STORAGE
As storage is limited and search times can soon become disastrous systems must be devised to
1. catalogue boards that are saved.
2. delete redundancies,
3. discard board positions of little value.
We
4. standardise all board positions so that black is to move factor=2,
5. can reduce the number of positions if all pieces are kings by symmetry, factor=4,
6. store positions on the basis of number of pieces on the board, the
presence of piece advantage, the side with advantage, the presence of
kings, the first moments of pieces about the diagonals.
To limit the number of records held an interesting feature has been
introduced ageing, and refreshing. All board positions are given an age
of 1
and after every turn the age is increased however when any board position
is consulted its age is halved. Those positions which are too old are
archived. i.e. forgetting.
The important parameters are
1. piece advantage
2. denial of occupancy
3. mobility
4. control of centre.
Played on an IBM 704 which could handle a milliion 36-bit words and up to 100 references per minute.
Checkers
A possible evaluation function for checkers is
6k + 4m + u
where k is the king advantage m is the man advantage and
u
is the undisputed mobility, which means the number of moves that can be
made
without leading to an immediate capture.
If we consider the position shown in the attached diagram then BLACK has a man on 16 and a king on 28 whereas WHITE has kings on 15 and 32. The computer is white and the score is
u = 4 - 2, k = 2 - 1, m = 0 - 1. Thus score is 6k + 4m + u or 6 - 4 + 2 = 4.
WHITE is in play and has to make a move ; there are five choices A,B,C,D or E. Using 1-ply we can score the moves as
A 15-10 score 4
B 15-11 score 5
C 15-18 score 4
D 15-19,16-23 score -6 this initial move leads to an immediate capture 16-23
E 32-27 score 6
The best move is actually 15-11 from a 3-ply analysis but E appears best so far.
Samuel's program
From 1946 Samuel has devoted a considerable amount of computer and human
effort in developing a checkers computer program to beat
human opponents. The work has been of such success that the program
can now beat most opponents and has reduced the excitement in
other programmers attempting this particular problem.
Samuel introduced the topic of machine learning and his program improved
its performance up to a limit against human opponents. Basically it
uses the concept of rote learning. Suppose a tree is terminated at
3-ply then each of the nodes is evaluated statically using the evaluation
function which consists of about 38 parameters. However if the root
node that is encountered is stored with its backed up value re occurs then
it will have a firmer value. Provided enough games are played then
most if not all of the nodes in the future game trees will have been seen
before and correctly valued. Thus the root node will have a virtual
value of 6-ply and as the process continues and the program learns so the
values given to each node will improve. An additional learning technique
is to play two programs against one another and the program that
wins has one set of parameters and the loser another. The new program
must have parameters that emphasise the good points of each in some
way. The storage of previous games was catalogued as follows;
the number of pieces on the board, the presence or absence of piece
advantage, the presence or absence of kings, the side with advantage. Positions
of little value, those that are not referred to very often, are deleted
and symmetry is used to cut down the number of stored positions by various
factors of 2. In 1959 the program used 60,000 board positions and an evaluation
choosing 16 parameters from 38, whilst in 1967 alpha beta pruning had been
introduced and only the most promising nodes were investigated. Book moves
were also incorporated.