A saddle point will be obtained once again by defining security strategies for each player. Each player treats the other as nature, and if the same worst-case value is obtained, then the result is a saddle point for the game. If the values are different, then a randomized plan is needed to close the gap between the upper and lower values.
Upper and lower values now depend on the initial state, . There was no equivalent for this in Section 10.5.1
because the root of the game tree is the only possible starting point.
If sequences,
and
, of actions are applied from
, then the state history,
, can be derived by repeatedly
using the state transition function,
. The upper
value from
is defined as
Steven M LaValle 2020-08-14