Now consider the probabilistic case. A value iteration method can be obtained by once again shadowing the presentation in Section 2.3.1. For from to , let denote the expected cost from stage to under the execution of the optimal plan (compare to (2.5)):

The optimal cost-to-go for the boundary condition of again reduces to (10.34).

Once again, the algorithm makes passes over , each time computing from , as ranges from down to . As before, is copied from . In the second iteration, is computed for each as (compare to (2.7))

in which and the expectation occurs over . Substitutions are made into (10.41) to obtain (compare to (2.8))

The general iteration is

which is obtained once again by pulling the first cost term out of the sum and by separating the minimization over from the rest. The second and expectation do not affect the term, which is pulled outside to obtain (compare to (2.10))

The inner and expectation define , yielding the recurrence (compare to (2.11) and (10.39))

If the cost term does not depend on , it can be written as , and (10.45) simplifies to

The dependency of state transitions on is implicit through the expression of , for which the definition uses and the state transition equation . The form given in (10.46) may be more convenient than (10.45) in implementations.

Steven M LaValle 2020-08-14