14.6.3.4 Computing optimal solutions via dynamic programming

Dynamic programming with interpolation, as covered in Section 14.5, can be applied to solve the problem once it is formulated in terms of the path-constrained phase space $ {S}
\subset {\mathbb{R}}^2$. The domain of $ \tau$ provides the constraint $ 0 \leq
s \leq 1$. Assume that only forward progress along the path is needed; moving in the reverse direction should not be necessary. This implies that $ {\dot s}> 0$. To make $ {S}$ bounded, an upper bound, $ {\dot s}_{max}$, is usually assumed, beyond which it is known that the speed is too high to follow the path.

Figure 14.27: (a) Planning occurs in the path-constrained phase space. (b) Due to the forward-progress assumption, value iteration can be reduced to a quick wavefront propagation across regularly spaced vertical lines in $ {S}$.
\begin{figure}\begin{center}
\begin{tabular}{ccc}
\psfig{file=figs/trajplan.eps,...
...5in} \\
(a) & \hspace*{0.25truein} & (b)
\end{tabular}
\end{center}\end{figure}

This results in the planning problem shown in Figure 14.27a. The system is expressed as $ {\ddot s}=
u'$, in which $ u' \in U'(s,{\dot s})$. The initial phase in $ {S}$ is $ (0,{\dot s}_i)$ and the goal phase is $ (1,{\dot s}_g)$. Typically, $ {\dot s}_i = {\dot s}_g = 0$. The region shown in Figure 14.27 is contained in the first quadrant of the phase space because only positive values of $ s$ and $ {\dot s}$ are allowed (in Figure 14.13, $ q$ and $ {\dot q}$ could be positive or negative). This implies that all motions are to the right. The actions determine whether accelerations or decelerations will occur.

Backward value iteration with interpolation can be easily applied by discretizing $ {S}$ and $ U'(s,{\dot s})$. Due to the constraint $ {\dot s}> 0$, making a Dijkstra-like version of the algorithm is straightforward. A simple wavefront propagation can even be performed, starting at $ s = 1$ and progressing backward in vertical waves until $ s = 0$ is reached. See Figure 14.27b. The backprojection (14.29) can be greatly simplified. Suppose that the $ s$-axis is discretized into $ m+1$ regularly spaced values $ s_0$, $ \ldots $, $ s_m$ at every $ \Delta s$, for some fixed $ \Delta s > 0$. Thus, $ s_k = (k \Delta s)/m$. The index $ k$ can be interpreted as the stage. Starting at $ k = m$, the final cost-to-go $ G^*_m(s_m,{\dot s}_m)$ is defined as 0 if the corresponding phase represents the goal, and $ \infty$ otherwise. At each $ s_k$, the $ {\dot s}$ values are sampled, and the cost-to-go function is represented using one-dimensional linear interpolation along the vertical axis. At each stage, the dynamic programming computation

$\displaystyle G^*_k(s_k,{\dot s}_k) = \min_{u' \in U'(s_k,{\dot s}_k)} \Big\{ l'_d(s_k,{\dot s}_k,u') + G^*_{k+1}(s_{k+1},{\dot s}_{k+1}) \Big\}$ (14.46)

is performed at each $ {\dot s}$ sample. This represents a special form of (14.27). Linear interpolation over discretized $ {\dot s}$ values is used to evaluate $ G^*_{k+1}(s_{k+1},{\dot s}_{k+1})$. The cost term $ l'_d$ is obtained from $ l_d$ by computing the original state $ x \in X$ from $ s$ and $ {\dot s}$; however, if the trajectory segment enters $ {S}_{obs}$, it receives infinite cost. The computations proceed until stage $ k=1$, at which time the optimal cost-to-go $ G^*_1(s_1,{\dot s}_1)$ is computed. The optimal trajectory is obtained by using the cost-to-go function at each stage as a navigation function.

The dynamic programming approach is so general that it can even be extended to path-constrained trajectory planning in the presence of higher order constraints [880]. For example, if a system is specified as $ q^{(3)} = h(q,{\dot q},{\ddot q},u)$, then a 3D path-constrained phase space results, in which each element is expressed as $ (s,{\dot s},{\ddot s})$. The actions in this space are jerks, yielding $ s^{(3)} = u'$ for $ u' \in U'(s,{\dot s},{\ddot s})$.

Steven M LaValle 2020-08-14