Value iteration

The value-iteration method from Section 10.2.1 can be applied without modification. In the first step, initialize $ G^*_F$ using (12.6). Using the notation for the new problem, the dynamic programming recurrence, (10.39), becomes

$\displaystyle G^*_k({\vec{x}}_k) = \min_{{\vec{u}}_k \in U} \Big\{ \max_{{\vec{...
...{\vec{l}}({\vec{x}}_k,{\vec{u}}_k) + G^*_{k+1}({\vec{x}}_{k+1}) \Big\} \Big\} ,$ (12.7)

in which $ {\vec{x}}_{k+1} = {\vec{f}}({\vec{x}}_k,{\vec{u}}_k,{\vec{\theta}}_k)$.

The main difficulty in evaluating (12.7) is to determine the set $ {\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k)$, over which the maximization occurs. Suppose that a state-nature sensor mapping is used, as defined in Section 11.1.1. From the I-state $ {\vec{x}}_k = X_k({\eta}_k)$, the action $ {\vec{u}}_k = u_k$ is applied. This yields a forward projection $ X_{k+1}({\eta}_k,u_k)$. The set of all possible observations is

\begin{displaymath}\begin{split}{\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k) = \{y_{k...
...x{ such that } y_{k+1} = h(x_{k+1},\psi_{k+1}) \} . \end{split}\end{displaymath} (12.8)

Without using forward projections, a longer, equivalent expression is obtained:

\begin{displaymath}\begin{split}{\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k) = \{y_{k...
... } y_{k+1} = h(f(x_k,u_k,\theta_k),\psi_{k+1}) \} . \end{split}\end{displaymath} (12.9)

Other variants can be formulated for different sensing models.

Steven M LaValle 2020-08-14