2.3.1.1 Backward value iteration

As for the search methods, there are both forward and backward versions of the approach. The backward case will be covered first. Even though it may appear superficially to be easier to progress from , it turns out that progressing backward from is notationally simpler. The forward case will then be covered once some additional notation is introduced.

The key to deriving long optimal plans from shorter ones lies in the construction of optimal cost-to-go functions over . For from to , let denote the cost that accumulates from stage to under the execution of the optimal plan:

Inside of the of (2.5) are the last terms of the cost functional, (2.4). The optimal cost-to-go for the boundary condition of reduces to

This makes intuitive sense: Since there are no stages in which an action can be applied, the final stage cost is immediately received.

Now consider an algorithm that makes passes over , each time computing from , as ranges from down to . In the first iteration, is copied from without significant effort. In the second iteration, is computed for each as

Since and , substitutions can be made into (2.7) to obtain

which is straightforward to compute for each . This computes the costs of all optimal one-step plans from stage to stage .

It will be shown next that can be computed similarly once is given. Carefully study (2.5) and note that it can be written as

by pulling the first term out of the sum and by separating the minimization over from the rest, which range from to . The second does not affect the term; thus, can be pulled outside to obtain

The inner is exactly the definition of the optimal cost-to-go function . Upon substitution, this yields the recurrence

in which . Now that the right side of (2.11) depends only on , , and , the computation of easily proceeds in time. This computation is called a

Summarizing, the value iterations proceed as follows:

(2.12) |

until finally is determined after time. The resulting may be applied to yield , the optimal cost to go to the goal from . It also conveniently gives the optimal cost-to-go from any other initial state. This cost is infinity for states from which cannot be reached in stages.

It seems convenient that the cost of the optimal plan can be computed so easily, but how is the actual plan extracted? One possibility is to store the action that satisfied the in (2.11) from every state, and at every stage. Unfortunately, this requires storage, but it can be reduced to using the tricks to come in Section 2.3.2 for the more general case of variable-length plans.

The cost-to-go functions are shown in Figure 2.9.
Figures 2.10 and 2.11 illustrate the
computations. For computing , only and receive
finite values because only they can reach in one stage. For
computing , only the values
and
are important. Only paths that reach or can possibly
lead to in stage . Note that the minimization in
always chooses the action that produces the lowest
total cost when arriving at a vertex in the next stage.

Steven M LaValle 2020-08-14