Approximate value iteration

The continuous-space methods from Section 10.6 can be directly applied to produce an approximate solution by interpolating over $ {\vec{X}}$ to determine cost-to-go values. The initial cost-to-go value $ G^*_F$ over the collection of samples is obtained by (12.6). Following (10.46), the dynamic programming recurrence is

$\displaystyle G^*_k({\vec{x}}_k) = \min_{{\vec{u}}_k \in {\vec{U}}} \Big\{ {\ve...
..._{k+1}({\vec{x}}_{k+1}) P({\vec{x}}_{k+1}\vert{\vec{x}}_k,{\vec{u}}_k) \Big\} .$ (12.10)

If $ {\vec{\Theta}}({\vec{x}},{\vec{u}})$ is finite, the probability mass is distributed over a finite set of points, $ y = {\vec{\theta}}\in
{\vec{\Theta}}({\vec{x}},{\vec{u}})$. This in turn implies that $ P({\vec{x}}_{k+1}\vert{\vec{x}}_k,{\vec{u}}_k)$ is also distributed over a finite subset of $ {\vec{X}}$. This is somewhat unusual because $ {\vec{X}}$ is a continuous space, which ordinarily requires the specification of a probability density function. Since the set of future states is finite, this enables a sum to be used in (12.10) as opposed to an integral over a probability density function. This technically yields a probability density over $ {\vec{X}}$, but this density must be expressed using Dirac functions.12.1 An approximation is still needed, however, because the $ x_{k+1}$ points may not be exactly the sample points on which the cost-to-go function $ G^*_{k+1}$ is represented.

Steven M LaValle 2020-08-14