Exact methods

If the total number of stages is small, it is possible in practice to compute exact representations. Some methods are based on an observation that the cost-to-come is piecewise linear and convex [494]. A linear-programming problem results, which can be solved using the techniques that were described for finding randomized saddle points of zero-sum games in Section 9.3. Due to the numerous constraints, methods have been proposed that dramatically reduce the number that need to be considered in some circumstances (see the suggested reading on POMDPs at the end of the chapter).

An exact, discrete representation can be computed as follows. Suppose that the initial condition space ${{\cal I}_0}$ consists of one initial condition, ${\eta_0}$ (or a finite number of initial conditions), and that there are no more than stages at which decisions are made. Since $\Theta(x,u)$ and $\Psi(x)$ are assumed to be finite, there is a finite number of possible final I-states, ${\eta}_F = ({\eta_0}, {\tilde{u}}_K, {\tilde{y}}_F)$ . For each of these, the distribution $P(x_F \vert {\eta}_F)$ can be computed, which is alternatively represented as ${\vec{x}}_F$ . Following this, (12.4) is used to compute $G^*({\vec{x}}_F)$ for each possible ${\vec{x}}_F$ . The number of these states is unfortunately exponential in the total number of stages, but at least there are finitely many. The dynamic programming recurrence (12.10) can be applied for to roll back one stage. It is known that each possible ${\vec{x}}_{k+1}$ will be a point in ${\vec{X}}$ at which a value was computed because values were computed for possible all I-states. Therefore, interpolation is not necessary. Equation 12.10 can be applied repeatedly until the first stage is reached. In each iteration, no interpolation is needed because the cost-to-go $G^*_{k+1}$ was computed for each possible next I-state. Given the enormous size of ${\cal I}$ , this method is practical only for very small problems.

Steven M LaValle 2020-08-14