Suppose there are two players, 
 and
 and 
 , that each have
to make a decision.  Each has a finite set of actions,
, that each have
to make a decision.  Each has a finite set of actions,  and
 and  ,
respectively.  The set
,
respectively.  The set  can be viewed as the ``replacement'' of
 can be viewed as the ``replacement'' of
 from Formulation 9.3 by a set of actions chosen
by a true opponent.  Each player has a cost function, which is denoted
as
 from Formulation 9.3 by a set of actions chosen
by a true opponent.  Each player has a cost function, which is denoted
as 
 for
 for  .  An important
constraint for zero-sum games is
.  An important
constraint for zero-sum games is
In light of (9.41) it is pointless to represent two cost
functions.  Instead, the superscript will be dropped, and  will
refer to the cost,
 will
refer to the cost,  , of
, of 
 .  The goal of
.  The goal of 
 is to minimize
 is to minimize
 .  Due to (9.41), the goal of
.  Due to (9.41), the goal of 
 is to maximize
 is to maximize  .
Thus,
.
Thus,  can be considered as a reward for
 can be considered as a reward for 
 , but a cost for
, but a cost for 
 .
.
A formulation can now be given:
 and
 and 
 .
.
 called the action space for
 called the action space for
 .  For convenience in describing examples, assume that
.  For convenience in describing examples, assume that  is a
set of consecutive integers from
 is a
set of consecutive integers from  to
 to  .  Each
.  Each  is
referred to as an action of
 is
referred to as an action of 
 .
.
 called the action space for
 called the action space for
 .  Assume that
.  Assume that  is a set of consecutive integers from
 is a set of consecutive integers from  to
 to
 .  Each
.  Each  is referred to as an action of
 is referred to as an action of 
 .
.
 called the cost function for
 called the cost function for 
 .  This
also serves as a reward function for
.  This
also serves as a reward function for 
 because of
(9.41).
 because of
(9.41).
Before discussing what it means to solve a zero-sum game, some
additional assumptions are needed.  Assume that the players know each
other's cost functions.  This implies that the motivation of the
opponent is completely understood.  The other assumption is that the
players are rational, which means
that they will try to obtain the best cost whenever possible. 
 will not choose an action that leads to higher cost when a lower cost
action is available.  Likewise,
will not choose an action that leads to higher cost when a lower cost
action is available.  Likewise, 
 will not choose an action that
leads to lower cost.  Finally, it is assumed that both players make
their decisions simultaneously.  There is no information regarding the
decision of
 will not choose an action that
leads to lower cost.  Finally, it is assumed that both players make
their decisions simultaneously.  There is no information regarding the
decision of 
 that can be exploited by
 that can be exploited by 
 , and vice versa.
, and vice versa.
Formulation 9.7 is often referred to as a matrix
game because  can be expressed with a cost matrix, as was done in
Section 9.2.  Here the matrix indicates costs for
 can be expressed with a cost matrix, as was done in
Section 9.2.  Here the matrix indicates costs for 
 and
and 
 , instead of the robot and nature.  All of the required
information from Formulation 9.7 is specified by a
single matrix; therefore, it is a convenient form for expressing
zero-sum games.
, instead of the robot and nature.  All of the required
information from Formulation 9.7 is specified by a
single matrix; therefore, it is a convenient form for expressing
zero-sum games.
 , the action set for
, the action set for 
 , contains three actions and
, contains three actions and
 contains four actions.  There should be
 contains four actions.  There should be 
 values
in the specification of the cost function,
 values
in the specification of the cost function,  .  This can be expressed
as a cost matrix,
.  This can be expressed
as a cost matrix,
 , and each column
corresponds to some
, and each column
corresponds to some  .  Each entry yields
.  Each entry yields  , which is
the cost for
, which is
the cost for 
 .  This representation is similar to that shown in
Example 9.8, except that the nature action space,
.  This representation is similar to that shown in
Example 9.8, except that the nature action space,
 , is replaced by
, is replaced by  .  The cost for
.  The cost for 
 is
 is  .
.  
 
 
Steven M LaValle 2020-08-14