The value-iteration
method for fixed-length plans can be generalized nicely to the case in
which plans of different lengths are allowed. There will be no bound
on the maximal length of a plan; therefore, the current case is truly
a generalization of Formulation 2.1 because arbitrarily long
plans may be attempted in efforts to reach . The model for
the general case does not require the specification of
but
instead introduces a special action,
.
![]() |
(2.17) |
The termination action is the key to allowing plans of different
lengths. It will appear throughout this book. Suppose that value
iterations are performed up to , and for the problem there exists
a two-step solution plan,
, that arrives in
from
. This plan is equivalent to the five-step plan
because the termination action does not change
the state, nor does it accumulate cost. The resulting five-step plan
reaches
and costs the same as
. With this simple
extension, the forward and backward value iteration methods of Section
2.3.1 may be applied for any fixed
to optimize over all
plans of length
or less (instead of fixing
).
The next step is to remove the dependency on . Consider running
backward value iterations indefinitely. At some point,
will
be computed, but there is no reason why the process cannot be
continued onward to
,
, and so on. Recall that
is not utilized in the backward value-iteration method;
therefore, there is no concern regarding the starting initial state of
the plans. Suppose that backward value iteration was applied for
and was executed down to
. This considers all plans
of length
or less. Note that it is harmless to add
to all
stage indices to shift all of the cost-to-go functions. Instead of
running from
to
, they can run from
to
without affecting their values. The index shifting is
allowed because none of the costs depend on the particular index that
is given to the stage. The only important aspect of the value
iterations is that they proceed backward and consecutively from stage
to stage.
Eventually, enough iterations will have been executed so that an
optimal plan is known from every state that can reach . From
that stage, say
, onward, the cost-to-go values from one value
iteration to the next will be stationary, meaning that for all
,
for all
. Once the stationary condition is
reached, the cost-to-go function no longer depends on a particular
stage
. In this case, the stage index may be dropped, and the
recurrence becomes
Are there any conditions under which backward value iterations could
be executed forever, with each iteration producing a cost-to-go
function for which some values are different from the previous
iteration? If is nonnegative for all
and
, then this could never happen. It could certainly be true that,
for any fixed
, longer plans will exist, but this cannot be said of
optimal plans. From every
, there either exists a plan
that reaches
with finite cost or there is no solution. For
each state from which there exists a plan that reaches
,
consider the number of stages in the optimal plan. Consider the
maximum number of stages taken from all states that can reach
. This serves as an upper bound on the number of value
iterations before the cost-to-go becomes stationary. Any further
iterations will just consider solutions that are worse than the ones
already considered (some may be equivalent due to the termination
action and shifting of stages). Some trouble might occur if
contains negative values. If the state transition graph contains a
cycle for which total cost is negative, then it is preferable to
execute a plan that travels around the cycle forever, thereby reducing
the total cost to
. Therefore, we will assume that the cost
functional is defined in a sensible way so that negative cycles do not
exist. Otherwise, the optimization model itself appears flawed. Some
negative values for
, however, are allowed as long as there
are no negative cycles. (It is straightforward to detect and report
negative cycles before running the value iterations.)
Since the particular stage index is unimportant, let be the
index of the final stage, which is the stage at which the backward
value iterations begin. Hence,
is the final stage cost,
which is obtained directly from
. Let
denote the stage
index at which the cost-to-go values all become stationary. At this
stage, the optimal cost-to-go function,
, is expressed by assigning
. In
other words, the particular stage index no longer matters. The value
gives the optimal cost to go from state
to the
specific goal state
.
If the optimal actions are not stored during the value iterations, the
optimal cost-to-go, , can be used to efficiently recover them.
Consider starting from some
. What is the optimal next
action? This is given by
As in the case of fixed-length plans, the direction of the value
iterations can be reversed to obtain a forward value-iteration method
for the variable-length planning problem. In this case, the backward
state transition equation, , is used once again. Also, the
initial cost term
is used instead of
, as in
(2.14). The forward value-iteration method starts at
,
and then iterates until the cost-to-come becomes stationary. Once
again, the termination action,
, preserves the cost of plans that
arrived at a state in earlier iterations. Note that it is not
required to specify
. A counterpart to
may be
obtained, from which optimal actions can be recovered. When the
cost-to-come values become stationary, an optimal cost-to-come
function,
, may be
expressed by assigning
, in which
is the final
stage reached when the algorithm terminates. The value
gives the cost of an optimal plan that starts from
and
reaches
. The optimal action sequence for any specified goal
can be obtained using
![]() |
![]() |
See Figure 2.14. After a few backward value iterations,
the cost-to-go values become stationary. After this point, the
termination action is being applied from all reachable states and no
further cost accumulates. The final cost-to-go function is defined to
be . Since
is not reachable from
,
.
As an example of using (2.19) to recover optimal actions,
consider starting from state . The action that leads to
is
chosen next because the total cost
is better than
(the
comes from the action cost). From state
, the optimal action leads to
, which produces total cost
. Similarly, the next action leads to
, which
terminates the plan.
![]() |
Using forward value iteration, suppose that
. The
following cost-to-come functions shown in Figure 2.15 are
obtained. For any finite value that remains constant from one
iteration to the next, the termination action was applied. Note that
the last value iteration is useless in this example. Once
is computed, the optimal cost-to-come to every possible state from
is determined, and future cost-to-come functions are
identical. Therefore, the final cost-to-come is renamed
.
Steven M LaValle 2020-08-14