Probabilistic case

Now consider the probabilistic case. A value iteration method can be obtained by once again shadowing the presentation in Section 2.3.1. For from to , let denote the expected cost from stage to under the execution of the optimal plan (compare to (2.5)):

$\displaystyle G^*_k(x_k) = \min_{u_k,\ldots,u_K} \left\{ E_{\theta_k,\ldots,\theta_K} \left[ \sum_{i=k}^{K} l(x_i,u_i,\theta_i) + l_F(x_F) \right] \right\} .$

(10.40)

The optimal cost-to-go for the boundary condition of

again reduces to (10.34).

Once again, the algorithm makes passes over , each time computing from $G^*_{k+1}$ , as ranges from down to . As before, is copied from . In the second iteration, is computed for each $x_K \in X$ as (compare to (2.7))

$\displaystyle G^*_K(x_K) = \min_{u_K} \Big\{ E_{\theta_K} \Big[ l(x_K,u_K,\theta_K) + l_F(x_F) \Big] \Big\},$

(10.41)

in which $u_K \in U(x_K)$ and the expectation occurs over $\theta_K$ . Substitutions are made into (10.41) to obtain (compare to (2.8))

$\displaystyle G^*_K(x_K) = \min_{u_K} \Big\{ E_{\theta_K} \Big[ l(x_K,u_K,\theta_K) + G^*_F(f(x_K,u_K,\theta_K)) \Big] \Big\} .$

(10.42)

The general iteration is

$\begin{displaymath}\begin{split}G^*_k(x_k) = & \min_{u_k} \Bigg\{ E_{\theta_k} \... ...eta_i) + l_F(\xKp1) \Bigg] \Bigg\} \Bigg] \Bigg\} , \end{split}\end{displaymath}$

(10.43)

which is obtained once again by pulling the first cost term out of the sum and by separating the minimization over ${u_k}$ from the rest. The second $\min$ and expectation do not affect the $l({x_k},{u_k},{\theta_k})$ term, which is pulled outside to obtain (compare to (2.10))

$\begin{displaymath}\begin{split}G^*_k({x_k}) = & \min_{{u_k}} \Bigg\{ E_{{\theta... ...theta_i) + l(\xKp1) \Bigg] \Bigg\} \Bigg] \Bigg\} . \end{split}\end{displaymath}$

(10.44)

The inner $\min$ and expectation define $G^*_{k+1}$ , yielding the recurrence (compare to (2.11) and (10.39))

$\begin{displaymath}\begin{split}G^*_k({x_k}) & = \min_{{u_k}\in U({x_k})} \Big\{... ...\theta_k))\Big) P({\theta_k}\vert x_k,u_k) \Big\} . \end{split}\end{displaymath}$

(10.45)

If the cost term does not depend on ${\theta_k}$ , it can be written as $l({x_k},{u_k})$ , and (10.45) simplifies to

$\displaystyle G^*_k({x_k}) = \min_{{u_k}\in U({x_k})} \bigg\{ l({x_k},{u_k}) + \sum_{x_{k+1} \in X} G^*_{k+1}(x_{k+1}) P(x_{k+1}\vert x_k,u_k) \bigg\} .$

(10.46)

The dependency of state transitions on $\theta_k$ is implicit through the expression of $P(x_{k+1}\vert x_k,u_k)$ , for which the definition uses $P(\theta_k \vert x_k,u_k)$ and the state transition equation

. The form given in (10.46) may be more convenient than (10.45) in implementations.

Steven M LaValle 2020-08-14