Now consider the probabilistic case. A value iteration method can be
obtained by once again shadowing the presentation in Section
2.3.1. For from to , let denote the
expected cost from stage to under the execution of the optimal
plan (compare to (2.5)):
|
(10.40) |
The optimal cost-to-go for the boundary condition of again
reduces to (10.34).
Once again, the algorithm makes passes over , each time
computing from , as ranges from down
to . As before, is copied from . In the second
iteration, is computed for each as (compare to
(2.7))
|
(10.41) |
in which
and the expectation occurs over .
Substitutions are made into (10.41) to obtain (compare to
(2.8))
|
(10.42) |
The general iteration is
|
(10.43) |
which is obtained once again by pulling the first cost term out of the
sum and by separating the minimization over from the rest. The
second and expectation do not affect the
term, which is pulled outside to obtain (compare to
(2.10))
|
(10.44) |
The inner and expectation define , yielding the
recurrence (compare to (2.11) and (10.39))
|
(10.45) |
If the cost term does not depend on
, it can be written as
, and (10.45) simplifies to
|
(10.46) |
The dependency of state transitions on is implicit through
the expression of
, for which the definition uses
and the state transition equation . The
form given in (10.46) may be more convenient than
(10.45) in implementations.
Steven M LaValle
2020-08-14