If the maximum number of stages is fixed in the problem definition,
then convergence is assured. Suppose, however, that there is no limit
on the number of stages. Recall from Section 2.3.2 that
each value iteration increases the total path length by one. The
actual stage indices were not important in backward dynamic
programming because arbitrary shifting of indices does not affect the
values. Eventually, the algorithm terminated because optimal
cost-to-go values had been computed for all reachable states from the
goal. This resulted in a stationary cost-to-go function because
the values no longer changed. States that are reachable from the goal
converged to finite values, and the rest remained at infinity. The
only problem that prevents the existence of a stationary cost-to-go
function, as mentioned in Section 2.3.2, is negative cycles
in the graph. In this case, the best plan would be to loop around the
cycle forever, which would reduce the cost to .
In the current setting, a stationary cost-to-go function once again
arises, but cycles once again cause difficulty in convergence. The
situation is, however, more complicated due to the influence of
nature. It is helpful to consider a plan-based state transition
graph,
. First consider the nondeterministic case. If
there exists a plan
from some state
for which all
possible actions of nature cause the traversal of cycles that
accumulate negative cost, then the optimal cost-to-go at
converges to
, which prevents the value iterations from
terminating. These cases can be detected in advance, and each such
initial state can be avoided (some may even be in a different
connected component of the state space).
![]() |
It is also possible that there are unavoidable positive cycles. In
Section 2.3.2, the cost-to-go function behaved differently
depending on whether the goal set was reachable. Due to nature, the
goal set may be possibly reachable or guaranteed reachable, as
illustrated in Figure 10.2. To be possibly reachable
from some initial state, there must exist a plan, , for which
there exists a sequence of nature actions that will lead the state
into the goal set. To be guaranteed reachable, the goal must be
reached in spite of all possible sequences of nature actions.
If the goal is possibly reachable, but not guaranteed reachable, from
some state
and all edges have positive cost, then the
cost-to-go value of
tends to infinity as the value iterations
are repeated. For example, every plan-based state transition graph
may contain a cycle of positive cost, and in the worst case, nature
may cause the state to cycle indefinitely. If convergence of the
value iterations is only evaluated at states from which the goal set
is guaranteed to be reachable, and if there are no negative cycles,
then the algorithm should terminate when all cost-to-go values remain
unchanged.
For the probabilistic case, there are three situations:
The third situation is unique to the probabilistic setting. This is
caused by positive or negative cycles in
for which the
edges have probabilities in
. The optimal plan may even have
such cycles. As the value iterations consider longer and longer
paths, a cycle may be traversed more times. However, each time the
cycle is traversed, the probability diminishes. The probabilities
diminish exponentially in terms of the number of stages, whereas the
costs only accumulate linearly. The changes in the cost-to-go
function gradually decrease and converge only to stationary values as
the number of iterations tends to infinity. If some approximation
error is acceptable, then the iterations can be terminated once the
maximum change over all of
is within some
threshold.
The required number of value iterations to obtain a solution of the
desired quality depends on the probabilities of following the cycles
and on their costs. If the probabilities are lower, then the
algorithm converges sooner.
![]() |
The expected cost from is straightforward to compute. With
probability
, the cost to reach
is
. With
probability
, the cost is
. With probability
, the cost
is
. Each time another cycle is taken, the cost increases by
,
but the probability is cut in half. This leads to the infinite series
Even though the cost converges to a finite value, this only occurs in
the limit. An infinite number of value iterations would theoretically
be required to obtain this result. For most applications, an
approximate solution suffices, and very high precision can be obtained
with a small number of iterations (e.g., after iterations, the
change is on the order of one-billionth). Thus, in general, it is
sensible to terminate the value iterations after the maximum
cost-to-go change is less than a threshold based directly on
precision.
Note that if nondeterministic uncertainty is used, then the value
iterations do not converge because, in the worst case, nature will
cause the state to cycle forever. Even though the goal is not
guaranteed reachable, the probabilistic uncertainty model allows
reasonable solutions.
Steven M LaValle 2020-08-14