Introducing nature

Figure 10.19: This is a single-stage, zero-sum game that involves nature. It is assumed that all players act at the same time.
\begin{figure}\centerline{\psfig{file=figs/gtreen.eps,width=3.0truein}}\end{figure}

A nature player can easily be introduced into a game. Suppose, for example, that nature is introduced into a zero-sum game. In this case, there are three players: $ {{\rm P}_1}$, $ {{\rm P}_2}$, and nature. Figure 10.19 shows a game tree for a single-stage, zero-sum game that involves nature. It is assumed that all three players act at the same time, which fits the stage-by-stage model. Many other information models are possible. Suppose that probabilistic uncertainty is used to model nature, and it is known that nature chooses the left branch with probability $ 1/3$ and the right branch with probability $ 2/3$. Depending on the branch chosen by nature, it appears that $ {{\rm P}_1}$ and $ {{\rm P}_2}$ will play a specific $ 2 \times 2$ matrix game. With probability $ 1/3$, the cost matrix will be

$\displaystyle \begin{tabular}{cc} & $V$\ \\ $U$\ & \begin{tabular}{\vert c\vert c\vert}\hline 3 & -2 \\ \hline -6 & 3 \\ \hline \end{tabular} \end{tabular} ,$ (10.116)

and with probability $ 2/3$ it will be

$\displaystyle \begin{tabular}{cc} & $V$  $U$ & \begin{tabular}{\vert c\vert c\vert}\hline 3 & -1  \hline 6 & 0  \hline \end{tabular} \end{tabular} .$ (10.117)

Unfortunately, $ {{\rm P}_1}$ and $ {{\rm P}_2}$ do not know which matrix game they are actually playing. The regret can be eliminated in the expected sense, if the game is played over many independent trials. Let $ A_1$ and $ A_2$ denote (10.117) and (10.118), respectively. Define a new cost matrix as $ A = (1/3) A_1 + (2/3) A_2$ (a scalar multiplied by a matrix scales every value of the matrix). The resulting matrix is

$\displaystyle \begin{tabular}{cc} & $V$  $U$ & \begin{tabular}{\vert c\vert c\vert}\hline 3 & 0  \hline 2 & 1  \hline \end{tabular} \end{tabular} .$ (10.118)

This matrix game has a deterministic saddle point in which $ {{\rm P}_1}$ chooses $ L$ (row 2) and $ {{\rm P}_2}$ chooses $ R$ (column 1), which yields a cost of $ 2$. This means that they can play a deterministic strategy to obtain an expected cost of $ 2$, if the game play is averaged over many independent trials. If this matrix did not admit a deterministic saddle point, then a randomized strategy would be needed. It is interesting to note that randomization is not needed for this example, even though $ {{\rm P}_1}$ and $ {{\rm P}_2}$ each play against both nature and an intelligent adversary.

Several other variations are possible. If nature is modeled nondeterministically, then a matrix of worst-case regrets can be formed to determine whether it is possible to eliminate regret. A sequential version of games such as the one in Figure 10.19 can be considered. In each stage, there are three substages in which nature, $ {{\rm P}_1}$, and $ {{\rm P}_2}$ all act. The bottom-up approach from Section 10.5.1 can be applied to decompose the tree into many single-stage games. Their costs can be propagated upward to the root in the same way to obtain an equilibrium solution.

Formulation 10.4 can be easily extended to include nature in games over state spaces. For each $ x$, a nature action set is defined as $ \Theta(x)$. The state transition equation is defined as

$\displaystyle x_{k+1} = f(x_k,u_k,v_k,\theta_k) ,$ (10.119)

which means that the next state depends on all three player actions, in addition to the current state. The value-iteration method can be extended to solve problems of this type by properly considering the effect of nature in the dynamic programming equations. In the probabilistic case, for example, an expectation over nature is needed in every iteration. The resulting sequential game is often referred to as a Markov game [774].

Steven M LaValle 2020-08-14