Probabilistic case

Now consider the probabilistic case. A value iteration method can be obtained by once again shadowing the presentation in Section 2.3.1. For $ k$ from $ 1$ to $ F$, let $ G^*_k$ denote the expected cost from stage $ k$ to $ F$ under the execution of the optimal plan (compare to (2.5)):

$\displaystyle G^*_k(x_k) = \min_{u_k,\ldots,u_K} \left\{ E_{\theta_k,\ldots,\theta_K} \left[ \sum_{i=k}^{K} l(x_i,u_i,\theta_i) + l_F(x_F) \right] \right\} .$ (10.40)

The optimal cost-to-go for the boundary condition of $ k = F$ again reduces to (10.34).

Once again, the algorithm makes $ K$ passes over $ X$, each time computing $ G^*_k$ from $ G^*_{k+1}$, as $ k$ ranges from $ F$ down to $ 1$. As before, $ G^*_F$ is copied from $ l_F$. In the second iteration, $ G^*_K$ is computed for each $ x_K \in X$ as (compare to (2.7))

$\displaystyle G^*_K(x_K) = \min_{u_K} \Big\{ E_{\theta_K} \Big[ l(x_K,u_K,\theta_K) + l_F(x_F) \Big] \Big\},$ (10.41)

in which $ u_K \in U(x_K)$ and the expectation occurs over $ \theta_K$. Substitutions are made into (10.41) to obtain (compare to (2.8))

$\displaystyle G^*_K(x_K) = \min_{u_K} \Big\{ E_{\theta_K} \Big[ l(x_K,u_K,\theta_K) + G^*_F(f(x_K,u_K,\theta_K)) \Big] \Big\} .$ (10.42)

The general iteration is

\begin{displaymath}\begin{split}G^*_k(x_k) = & \min_{u_k} \Bigg\{ E_{\theta_k} \...
...eta_i) + l_F(\xKp1) \Bigg] \Bigg\} \Bigg] \Bigg\} , \end{split}\end{displaymath} (10.43)

which is obtained once again by pulling the first cost term out of the sum and by separating the minimization over $ {u_k}$ from the rest. The second $ \min$ and expectation do not affect the $ l({x_k},{u_k},{\theta_k})$ term, which is pulled outside to obtain (compare to (2.10))

\begin{displaymath}\begin{split}G^*_k({x_k}) = & \min_{{u_k}} \Bigg\{ E_{{\theta...
...theta_i) + l(\xKp1) \Bigg] \Bigg\} \Bigg] \Bigg\} . \end{split}\end{displaymath} (10.44)

The inner $ \min$ and expectation define $ G^*_{k+1}$, yielding the recurrence (compare to (2.11) and (10.39))

\begin{displaymath}\begin{split}G^*_k({x_k}) & = \min_{{u_k}\in U({x_k})} \Big\{...
...\theta_k))\Big) P({\theta_k}\vert x_k,u_k) \Big\} . \end{split}\end{displaymath} (10.45)

If the cost term does not depend on $ {\theta_k}$, it can be written as $ l({x_k},{u_k})$, and (10.45) simplifies to

$\displaystyle G^*_k({x_k}) = \min_{{u_k}\in U({x_k})} \bigg\{ l({x_k},{u_k}) + \sum_{x_{k+1} \in X} G^*_{k+1}(x_{k+1}) P(x_{k+1}\vert x_k,u_k) \bigg\} .$ (10.46)

The dependency of state transitions on $ \theta_k$ is implicit through the expression of $ P(x_{k+1}\vert x_k,u_k)$, for which the definition uses $ P(\theta_k \vert x_k,u_k)$ and the state transition equation $ f$. The form given in (10.46) may be more convenient than (10.45) in implementations.

Steven M LaValle 2020-08-14