Value iteration

The value-iteration method from Section 10.2.1 can be applied without modification. In the first step, initialize using (12.6). Using the notation for the new problem, the dynamic programming recurrence, (10.39), becomes

$\displaystyle G^*_k({\vec{x}}_k) = \min_{{\vec{u}}_k \in U} \Big\{ \max_{{\vec{... ...{\vec{l}}({\vec{x}}_k,{\vec{u}}_k) + G^*_{k+1}({\vec{x}}_{k+1}) \Big\} \Big\} ,$

(12.7)

in which ${\vec{x}}_{k+1} = {\vec{f}}({\vec{x}}_k,{\vec{u}}_k,{\vec{\theta}}_k)$ .

The main difficulty in evaluating (12.7) is to determine the set ${\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k)$ , over which the maximization occurs. Suppose that a state-nature sensor mapping is used, as defined in Section 11.1.1. From the I-state ${\vec{x}}_k = X_k({\eta}_k)$ , the action ${\vec{u}}_k = u_k$ is applied. This yields a forward projection $X_{k+1}({\eta}_k,u_k)$ . The set of all possible observations is

$\begin{displaymath}\begin{split}{\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k) = \{y_{k... ...x{ such that } y_{k+1} = h(x_{k+1},\psi_{k+1}) \} . \end{split}\end{displaymath}$

(12.8)

Without using forward projections, a longer, equivalent expression is obtained:

$\begin{displaymath}\begin{split}{\vec{\Theta}}({\vec{x}}_k,{\vec{u}}_k) = \{y_{k... ... } y_{k+1} = h(f(x_k,u_k,\theta_k),\psi_{k+1}) \} . \end{split}\end{displaymath}$

(12.9)

Other variants can be formulated for different sensing models.

Steven M LaValle 2020-08-14