As for the search methods, there are both forward and backward
versions of the approach.  The backward case will be covered first.
Even though it may appear superficially to be easier to progress from
 , it turns out that progressing backward from
, it turns out that progressing backward from  is
notationally simpler.  The forward case will then be covered once some
additional notation is introduced.
 is
notationally simpler.  The forward case will then be covered once some
additional notation is introduced.
The key to deriving long optimal plans from shorter ones lies in the
construction of optimal cost-to-go functions over  .  For
.  For  from
 from
 to
 to  , let
, let  denote the cost that accumulates from stage
 denote the cost that accumulates from stage
 to
 to  under the execution of the optimal plan:
 under the execution of the optimal plan:
 of (2.5) are the last
 of (2.5) are the last  terms of
the cost functional, (2.4).  The optimal cost-to-go for
the boundary condition of
 terms of
the cost functional, (2.4).  The optimal cost-to-go for
the boundary condition of  reduces to
 reduces to
Now consider an algorithm that makes  passes over
 passes over  , each time
computing
, each time
computing  from
 from  , as
, as  ranges from
 ranges from  down to
 down to
 .  In the first iteration,
.  In the first iteration,  is copied from
 is copied from  without
significant effort.  In the second iteration,
 without
significant effort.  In the second iteration,  is computed
for each
 is computed
for each  as
 as
 and
 and 
 , substitutions can be
made into (2.7) to obtain
, substitutions can be
made into (2.7) to obtain
 .  This
computes the costs of all optimal one-step plans from stage
.  This
computes the costs of all optimal one-step plans from stage  to
stage
 to
stage  .
.
It will be shown next that  can be computed similarly once
 can be computed similarly once
 is given.  Carefully study (2.5) and note that
it can be written as
 is given.  Carefully study (2.5) and note that
it can be written as
 from the rest, which range from
 from the rest, which range from  to
 to
 .  The second
.  The second  does not affect the
 does not affect the 
 term; thus,
 term; thus,
 can be pulled outside to obtain
 can be pulled outside to obtain
 is exactly the definition of the optimal cost-to-go
function
 is exactly the definition of the optimal cost-to-go
function  .  Upon substitution, this yields the recurrence
.  Upon substitution, this yields the recurrence
 .  Now that the right side of
(2.11) depends only on
.  Now that the right side of
(2.11) depends only on  ,
,  , and
, and  , the
computation of
, the
computation of  easily proceeds in
 easily proceeds in 
 time. This computation is
called a value iteration.  Note that in each value iteration,
some states receive an infinite value only because they are not
reachable; a
time. This computation is
called a value iteration.  Note that in each value iteration,
some states receive an infinite value only because they are not
reachable; a  -step plan from
-step plan from  to
 to  does not exist.
This means that there are no actions,
 does not exist.
This means that there are no actions, 
 , that bring
, that bring
 to a state
 to a state 
 from which a
 from which a  -step plan exists
that terminates in
-step plan exists
that terminates in  .
.
Summarizing, the value iterations proceed as follows:
|  | (2.12) | 
 is determined after
 is determined after 
 time.  The
resulting
 time.  The
resulting  may be applied to yield
 may be applied to yield 
 , the
optimal cost to go to the goal from
, the
optimal cost to go to the goal from  .  It also conveniently
gives the optimal cost-to-go from any other initial state.  This cost
is infinity for states from which
.  It also conveniently
gives the optimal cost-to-go from any other initial state.  This cost
is infinity for states from which  cannot be reached in
 cannot be reached in  stages.
stages.  
It seems convenient that the cost of the optimal plan can be computed
so easily, but how is the actual plan extracted?  One possibility is
to store the action that satisfied the  in (2.11)
from every state, and at every stage.  Unfortunately, this requires
 in (2.11)
from every state, and at every stage.  Unfortunately, this requires
 storage, but it can be reduced to
 storage, but it can be reduced to  using the tricks
to come in Section 2.3.2 for the more general case of
variable-length plans.
 using the tricks
to come in Section 2.3.2 for the more general case of
variable-length plans.
 .  Suppose that
.  Suppose that  ,
, 
 ,
and
,
and 
 .  There will hence be four value iterations, which
construct
.  There will hence be four value iterations, which
construct  ,
,  ,
,  , and
, and  , once the
final-stage cost-to-go,
, once the
final-stage cost-to-go,  , is given.
, is given.  
|  | 
|  | 
|  | 
The cost-to-go functions are shown in Figure 2.9.
Figures 2.10 and 2.11 illustrate the
computations.  For computing  , only
, only  and
 and  receive
finite values because only they can reach
 receive
finite values because only they can reach  in one stage.  For
computing
 in one stage.  For
computing  , only the values
, only the values 
 and
 and 
 are important.  Only paths that reach
 are important.  Only paths that reach  or
 or  can possibly
lead to
 can possibly
lead to  in stage
 in stage  .  Note that the minimization in
.  Note that the minimization in
 always chooses the action that produces the lowest
total cost when arriving at a vertex in the next stage.
 always chooses the action that produces the lowest
total cost when arriving at a vertex in the next stage. 
 
 
Steven M LaValle 2020-08-14