A saddle point will be obtained once again by defining security strategies for each player. Each player treats the other as nature, and if the same worst-case value is obtained, then the result is a saddle point for the game. If the values are different, then a randomized plan is needed to close the gap between the upper and lower values.
Upper and lower values now depend on the initial state, . There was no equivalent for this in Section 10.5.1 because the root of the game tree is the only possible starting point.
If sequences, and , of actions are applied from , then the state history, , can be derived by repeatedly using the state transition function, . The upper value from is defined as
Steven M LaValle 2020-08-14