2.3.1.2 Forward value iteration

The ideas from Section 2.3.1 may be recycled to yield a symmetrically equivalent method that computes optimal cost-to-come functions from the initial stage. Whereas backward value iterations were able to find optimal plans from all initial states simultaneously, forward value iterations can be used to find optimal plans to all states in $ X$. In the backward case, $ {X_{G}}$ must be fixed, and in the forward case, $ {x_{I}}$ must be fixed.

The issue of maintaining feasible solutions appears again. In the forward direction, the role of $ l_F$ is not important. It may be applied in the last iteration, or it can be dropped altogether for problems that do not have a predetermined $ {X_{G}}$. However, one must force all plans considered by forward value iteration to originate from $ {x_{I}}$. We again have the choice of either making notation that imposes constraints on the action spaces or simply adding a term that forces infeasible plans to have infinite cost. Once again, the latter will be chosen here.

Let $ C^*_k$ denote the optimal cost-to-come from stage $ 1$ to stage $ k$, optimized over all $ (k-1)$-step plans. To preclude plans that do not start at $ {x_{I}}$, the definition of $ C^*_1$ is given by

$\displaystyle C^*_1(x_1) = l_I(x_1) ,$ (2.13)

in which $ l_I$ is a new function that yields $ l_I({x_{I}}) = 0$, and $ l_I(x) = \infty$ for all $ x \not = {x_{I}}$. Thus, any plans that try to start from a state other than $ {x_{I}}$ will immediately receive infinite cost.

For an intermediate stage, $ k \in \{ 2,\ldots,K \}$, the following represents the optimal cost-to-come:

$\displaystyle C^*_k(x_k) = \min_{u_1,\ldots,u_{k-1}} \left\{ l_I(x_1) + \sum_{i=1}^{k-1} l(x_i,u_i) \right\} .$ (2.14)

Note that the sum refers to a sequence of states, $ x_1, \ldots,
x_{k-1}$, which is the result of applying the action sequence $ (u_1,
\ldots, u_{k-2})$. The last state, $ x_k$, is not included because its cost term, $ l(x_k,u_k)$, requires the application of an action, $ u_k$, which has not been chosen. If it is possible to write the cost additively, as $ l(x_k,u_k) = l_1(x_k) + l_2(u_k)$, then the $ l_1(x_k)$ part could be included in the cost-to-come definition, if desired. This detail will not be considered further.

As in (2.5), it is assumed in (2.14) that $ u_i \in
U(x_i)$ for every $ i \in \{ 1,\ldots,k-1\}$. The resulting $ x_k$, obtained after applying $ u_{k-1}$, must be the same $ x_k$ that is named in the argument on the left side of (2.14). It might appear odd that $ x_1$ appears inside of the $ \min$ above; however, this is not a problem. The state $ x_1$ can be completely determined once $ u_1,\ldots,u_{k-1}$ and $ x_k$ are given.

The final forward value iteration is the arrival at the final stage, $ F$. The cost-to-come in this case is

$\displaystyle C^*_F(x_F) = \min_{u_1,\ldots,u_K} \left\{ l_I(x_1) + \sum_{i=1}^K l(x_i,u_i) \right\} .$ (2.15)

This equation looks the same as (2.5) after substituting $ k=1$; however, $ l_I$ is used here instead of $ l_F$. This has the effect of filtering the plans that are considered to include only those that start at $ {x_{I}}$. The forward value iterations find optimal plans to any reachable final state from $ {x_{I}}$. This behavior is complementary to that of backward value iteration. In that case, $ {X_{G}}$ was fixed, and optimal plans from any initial state were found. For forward value iteration, this is reversed.

To express the dynamic-programming recurrence, one further issue remains. Suppose that $ C^*_{k-1}$ is known by induction, and we want to compute $ C^*_k(x_k)$ for a particular $ x_k$. This means that we must start at some state $ x_{k-1}$ and arrive in state $ x_k$ by applying some action. Once again, the backward state transition equation from Section 2.2.3 is useful. Using the stage indices, it is written here as $ x_{k-1} = {f^{-1}}(x_k,u^{-1}_k)$.

The recurrence is

$\displaystyle C^*_k(x_k) = \min_{u^{-1}_k \in {U^{-1}}(x_k)} \Big\{ C^*_{k-1}(x_{k-1}) + l(x_{k-1},u_{k-1}) \Big\} ,$ (2.16)

in which $ x_{k-1} = {f^{-1}}(x_k,u^{-1}_k)$ and $ u_{k-1} \in U(x_{k-1})$ is the input to which $ u^{-1}_k \in {U^{-1}}(x_k)$ corresponds. Using (2.16), the final cost-to-come is iteratively computed in $ O(K \vert X\vert \vert U\vert)$ time, as in the case of computing the first-stage cost-to-go in the backward value-iteration method.

Figure 2.12: The optimal cost-to-come functions computed by forward value iteration.
\begin{figure}\begin{center}
\begin{tabular}{\vert c\vert c\vert c\vert c\vert c...
...ne
$C^*_5$ & 6 & 6 & 5 & 6 & 5  \hline
\end{tabular}\end{center}
\end{figure}

Example 2..4 (Forward Value Iteration)   Example 2.3 is revisited for the case of forward value iterations with a fixed plan length of $ K=4$. The cost-to-come functions shown in Figure 2.12 are obtained by direct application of (2.16). It will be helpful to refer to Figures 2.10 and 2.11 once again. The first row corresponds to the immediate application of $ l_I$. In the second row, finite values are obtained for $ a$ and $ b$, which are reachable in one stage from $ {x_{I}}= a$. The iterations continue until $ k=5$, at which point that optimal cost-to-come is determined for every state. $ \blacksquare$

Steven M LaValle 2020-08-14