Policy iteration for discounted cost

The policy iteration method may alternatively be applied to the probabilistic discounted-cost problem. Recall the method given in Figure 10.4. The general approach remains the same: A search is conducted over the space of plans by solving a linear system of equations in each iteration. In Step 2, (10.53) is replaced by

$\displaystyle J_\pi (x) = l(x,u) + \alpha \sum_{x^\prime \in X} J_\pi (x^\prime) P(x^\prime \vert x,u),$

(10.76)

which is a special form of (10.76) for evaluating a fixed plan. In Step 3, (10.54) is replaced by

$\displaystyle \pi ^\prime(x) = \operatornamewithlimits{argmin}_{u \in U(x)} \Bi... ... + \alpha \sum_{x^\prime \in X} J_\pi (x^\prime) P(x^\prime \vert x,u) \Big\} .$

(10.77)

Using these alterations, the policy iteration algorithm proceeds in the same way as in Section 10.2.2.

Steven M LaValle 2020-08-14