10.6.1 Extending the value-iteration method

The presentation follows in the same way as in Section 8.5.2, by beginning with the discrete problem and making various components continuous. Begin with Formulation 10.1 and let $ X$ be a bounded, open subset of $ {\mathbb{R}}^n$. Assume that $ U(x)$ and $ \Theta(x,u)$ are finite. The value-iteration methods of Section 10.2.1 can be directly applied by using the interpolation concepts from Section 8.5.2 to compute the cost-to-go values over $ X$. In the nondeterministic case, the recurrence is (10.39), in which $ G^*_{k+1}$ is represented on a finite sample set $ S \subset X$ and is evaluated on all other points in $ R(S)$ by interpolation (recall from Section 8.5.2 that $ R(S)$ is the interpolation region of $ S$). In the probabilistic case, (10.45) or (10.46) may once again be used, but $ G^*_{k+1}$ is evaluated by interpolation.

If $ U(x)$ is continuous, then it can be sampled to evaluate the $ \min$ in each recurrence, as suggested in Section 8.5.2. Now suppose $ \Theta(x,u)$ is continuous. In the nondeterministic case, $ \Theta(x,u)$ can be sampled to evaluate the $ \max$ in (10.39) or it may be possible to employ a general optimization technique directly over $ \Theta(x,u)$. In the probabilistic case, the expectation must be taken over a continuous probability space. A probability density function, $ p(\theta\vert x,u)$, characterizes nature's action. A probabilistic state transition density function can be derived from this as $ p(x_{k+1}\vert x_k,u_k)$. Using these densities, the continuous versions of (10.45) and (10.46) become

$\displaystyle G^*_k({x_k}) = \min_{{u_k}\in U({x_k})} \left\{ \int_{\Theta(x_k,...
..._{k+1}(f(x_k,u_k,\theta_k)) \Big) p({\theta_k}\vert x_k,u_k) d\theta_k \right\}$ (10.121)

and

$\displaystyle G^*_k({x_k}) = \min_{{u_k}\in U({x_k})} \left\{ l({x_k},{u_k}) + \int_{X} G^*_{k+1}(x_{k+1}) p(x_{k+1}\vert x_k,u_k) dx_{k+1}\right\} ,$ (10.122)

respectively. Sampling can be used to evaluate the integrals. One straightforward method is to approximate $ p(\theta\vert x,u)$ by a discrete distribution. For example, in one dimension, this can be achieved by partitioning $ \Theta(x,u)$ into intervals, in which each interval is declared to be a discrete nature action. The probability associated with the discrete nature action is just the integral of $ p(\theta\vert x,u)$ over the associated interval.

Section 8.5.2 concluded by describing Dijkstra-like algorithms for continuous spaces. These were derived mainly by using backprojections, (8.66), to conclude that some samples cannot change their values because they are too far from the active set. The same principle can be applied in the current setting; however, the weak backprojection, (10.20), must be used instead. Using the weak backprojection, the usual value iterations can be applied while removing all samples that are not in the active set. For many problems, however, the size of the active set may quickly become unmanageable because the weak backprojection often causes much faster propagation than the original backprojection. Continuous-state generalizations of the Dijkstra-like algorithms in Section 10.2.3 can be made; however, this requires the additional condition that in every iteration, it must be possible to extend $ D$ by forcing the next state to lie in $ R(D)$, in spite of nature.

Steven M LaValle 2020-08-14