Saddle points in a sequential game

A saddle point will be obtained once again by defining security strategies for each player. Each player treats the other as nature, and if the same worst-case value is obtained, then the result is a saddle point for the game. If the values are different, then a randomized plan is needed to close the gap between the upper and lower values.

Upper and lower values now depend on the initial state, $x_1 \in X$ . There was no equivalent for this in Section 10.5.1 because the root of the game tree is the only possible starting point.

If sequences, ${\tilde{u}}_K$ and ${\tilde{v}}_K$ , of actions are applied from , then the state history, ${\tilde{x}}_F$ , can be derived by repeatedly using the state transition function, . The upper value from is defined as

$\displaystyle \overline{L}^*(x_1) = \min_{u_1} \max_{v_1} \min_{u_2} \max_{v_2}... ...n_{u_K} \max_{v_K} \Big\{ L({\tilde{x}}_F,{\tilde{u}}_K,{\tilde{v}}_K) \Big\} ,$

(10.107)

which is identical to (10.33) if ${{\rm P}_2}$ is replaced by nature. Also, (10.108) generalizes (9.44) to multiple stages. The lower value from

, which generalizes (9.46), is

$\displaystyle \underline{L}^*(x_1) = \max_{v_1} \min_{u_1} \max_{v_2} \min_{u_2... ...x_{v_K} \min_{u_K} \Big\{ L({\tilde{x}}_F,{\tilde{u}}_K,{\tilde{v}}_K) \Big\} .$

(10.108)

If $\overline{L}^*(x_1) = \underline{L}^*(x_2)$ , then a deterministic saddle point exists from

. This implies that the order of $\max$ and $\min$ can be swapped inside of every stage.

Steven M LaValle 2020-08-14