2.3.1.1 Backward value iteration

As for the search methods, there are both forward and backward versions of the approach. The backward case will be covered first. Even though it may appear superficially to be easier to progress from ${x_{I}}$ , it turns out that progressing backward from ${X_{G}}$ is notationally simpler. The forward case will then be covered once some additional notation is introduced.

The key to deriving long optimal plans from shorter ones lies in the construction of optimal cost-to-go functions over

. For

from

, let

denote the cost that accumulates from stage

under the execution of the optimal plan:

Now consider an algorithm that makes

passes over

, each time computing

from $G^*_{k+1}$ , as

ranges from

down to

. In the first iteration,

is copied from

without significant effort. In the second iteration,

is computed for each $x_K \in X$ as

It will be shown next that

can be computed similarly once $G^*_{k+1}$ is given. Carefully study (2.5) and note that it can be written as

It seems convenient that the cost of the optimal plan can be computed so easily, but how is the actual plan extracted? One possibility is to store the action that satisfied the $\min$ in (2.11) from every state, and at every stage. Unfortunately, this requires $O(K \vert X\vert)$ storage, but it can be reduced to $O(\vert X\vert)$ using the tricks to come in Section 2.3.2 for the more general case of variable-length plans.

Example 2..3 (A Five-State Optimal Planning Problem) Figure 2.8 shows a graph representation of a planning problem in which $X = \{a,c,b,d,e\}$ . Suppose that

, ${x_{I}}= a$ , and ${X_{G}}= \{d\}$ . There will hence be four value iterations, which construct

, and

, once the final-stage cost-to-go,

, is given.

**Figure 2.8:** A five-state example. Each vertex represents a state, and each edge represents an input that can be applied to the state transition equation to change the state. The weights on the edges represent ( is the originating vertex of the edge).
$\begin{figure}\centerline{\psfig{figure=figs/fivestate.eps,width=5.0in} }\end{figure}$

**Figure 2.9:** The optimal cost-to-go functions computed by backward value iteration.
$\begin{figure}\begin{center} \begin{tabular}{\vert c\vert c\vert c\vert c\vert c... ...1$ & 6 & 4 & 5 & 4 & $\infty$ \hline \end{tabular}\end{center} \end{figure}$

**Figure 2.10:** The possibilities for advancing forward one stage. This is obtained by making two copies of the states from Figure 2.8, one copy for the current state and one for the potential next state.
$\begin{figure}\centerline{\psfig{figure=figs/fivestate2r.eps,width=4.5in} }\end{figure}$

**Figure 2.11:** By turning Figure 2.10 sideways and copying it times, a graph can be drawn that easily shows all of the ways to arrive at a final state from an initial state by flowing from left to right. The computations automatically select the optimal route.
$\begin{figure}\centerline{\psfig{figure=figs/fivestate3.eps,width=4.0in} }\end{figure}$

The cost-to-go functions are shown in Figure 2.9. Figures 2.10 and 2.11 illustrate the computations. For computing , only and receive finite values because only they can reach in one stage. For computing , only the values and are important. Only paths that reach or can possibly lead to in stage . Note that the minimization in $% latex2html id marker 121077 $ (\ref{eqn:ctgk2})$$ always chooses the action that produces the lowest total cost when arriving at a vertex in the next stage. $\blacksquare$