9.3.1 Game Formulation

Suppose there are two players, ${{\rm P}_1}$ and ${{\rm P}_2}$ , that each have to make a decision. Each has a finite set of actions,

and

, respectively. The set

can be viewed as the ``replacement'' of $\Theta$ from Formulation 9.3 by a set of actions chosen by a true opponent. Each player has a cost function, which is denoted as $L_i : U \times V \rightarrow {\mathbb{R}}$ for

. An important constraint for zero-sum games is

In light of (9.41) it is pointless to represent two cost functions. Instead, the superscript will be dropped, and

will refer to the cost,

, of ${{\rm P}_1}$ . The goal of ${{\rm P}_1}$ is to minimize

. Due to (9.41), the goal of ${{\rm P}_2}$ is to maximize

. Thus,

can be considered as a reward for ${{\rm P}_2}$ , but a cost for ${{\rm P}_1}$ .

Formulation 9..7 (A Zero-Sum Game)

Two players, ${{\rm P}_1}$ and ${{\rm P}_2}$ .
A nonempty, finite set called the action space for ${{\rm P}_1}$ . For convenience in describing examples, assume that is a set of consecutive integers from to $\vert U\vert$ . Each $u \in U$ is referred to as an action of ${{\rm P}_1}$ .
A nonempty, finite set called the action space for ${{\rm P}_2}$ . Assume that is a set of consecutive integers from to $\vert V\vert$ . Each $v \in V$ is referred to as an action of ${{\rm P}_2}$ .
A function $L: U \times V \rightarrow {\mathbb{R}}\cup \{-\infty,\infty\}$ called the cost function for ${{\rm P}_1}$ . This also serves as a reward function for ${{\rm P}_2}$ because of (9.41).

Before discussing what it means to solve a zero-sum game, some additional assumptions are needed. Assume that the players know each other's cost functions. This implies that the motivation of the opponent is completely understood. The other assumption is that the players are rational, which means that they will try to obtain the best cost whenever possible. ${{\rm P}_1}$ will not choose an action that leads to higher cost when a lower cost action is available. Likewise, ${{\rm P}_2}$ will not choose an action that leads to lower cost. Finally, it is assumed that both players make their decisions simultaneously. There is no information regarding the decision of ${{\rm P}_1}$ that can be exploited by ${{\rm P}_2}$ , and vice versa.

Formulation 9.7 is often referred to as a matrix game because

can be expressed with a cost matrix, as was done in Section 9.2. Here the matrix indicates costs for ${{\rm P}_1}$ and ${{\rm P}_2}$ , instead of the robot and nature. All of the required information from Formulation 9.7 is specified by a single matrix; therefore, it is a convenient form for expressing zero-sum games.

Example 9..13 (Matrix Representation of a Zero-Sum Game) Suppose that

, the action set for ${{\rm P}_1}$ , contains three actions and

contains four actions. There should be $3 \times 4 = 12$ values in the specification of the cost function,

. This can be expressed as a cost matrix,

$\displaystyle \begin{tabular}{cc} & $V$ $U$ & \begin{tabular}{\vert c\vert... ...0 & -1 & 2 & 1 \hline -2 & 2 & 0 & 1 \hline \end{tabular} \end{tabular} ,$

(9.42)

in which each row corresponds to some $u \in U$ , and each column corresponds to some $v \in V$ . Each entry yields

, which is the cost for ${{\rm P}_1}$ . This representation is similar to that shown in Example 9.8, except that the nature action space, $\Theta$ , is replaced by

. The cost for ${{\rm P}_2}$ is

. $\blacksquare$