Approximate value iteration

The continuous-space methods from Section 10.6 can be directly applied to produce an approximate solution by interpolating over ${\vec{X}}$ to determine cost-to-go values. The initial cost-to-go value over the collection of samples is obtained by (12.6). Following (10.46), the dynamic programming recurrence is

$\displaystyle G^*_k({\vec{x}}_k) = \min_{{\vec{u}}_k \in {\vec{U}}} \Big\{ {\ve... ..._{k+1}({\vec{x}}_{k+1}) P({\vec{x}}_{k+1}\vert{\vec{x}}_k,{\vec{u}}_k) \Big\} .$

(12.10)

If ${\vec{\Theta}}({\vec{x}},{\vec{u}})$ is finite, the probability mass is distributed over a finite set of points, $y = {\vec{\theta}}\in {\vec{\Theta}}({\vec{x}},{\vec{u}})$ . This in turn implies that $P({\vec{x}}_{k+1}\vert{\vec{x}}_k,{\vec{u}}_k)$ is also distributed over a finite subset of ${\vec{X}}$ . This is somewhat unusual because ${\vec{X}}$ is a continuous space, which ordinarily requires the specification of a probability density function. Since the set of future states is finite, this enables a sum to be used in (12.10) as opposed to an integral over a probability density function. This technically yields a probability density over ${\vec{X}}$ , but this density must be expressed using Dirac functions.^12.1 An approximation is still needed, however, because the $x_{k+1}$ points may not be exactly the sample points on which the cost-to-go function $G^*_{k+1}$ is represented.

Steven M LaValle 2020-08-14