For motion planning problems, it is expected that , as computed from (8.62), is always close to relative to the size of . This suggests the use of a Dijkstra-like algorithm to compute optimal feedback plans more efficiently. As discussed for the finite case in Section 2.3.3, many values remain unchanged during the value iterations, as indicated in Example 2.5. Dijkstra's algorithm maintains a data structure that focuses the computation on the part of the state space where values are changing. The same can be done for the continuous case by carefully considering the sample points [607].
During the value iterations, there are three kinds of sample points, just as in the discrete case (recall from Section 2.3.3):
Imagine the first value iteration. Initially, all points in are set to zero values. Among the collection of samples , how many can reach in a single stage? We expect that samples very far from will not be able to reach ; this keeps their values are infinity. The samples that are close to should reach it. It would be convenient to prune away from consideration all samples that are too far from to lower their value. In every iteration, we eliminate iterating over samples that are too far away from those already reached. It is also unnecessary to iterate over the dead samples because their values no longer change.
To keep track of reachable samples, it will be convenient to introduce the notion of a backprojection, which will be studied further in Section 10.1. For a single state, , its backprojection is defined as
Now consider a version of value iteration that uses backprojections to eliminate some states from consideration because it is known that their values cannot change. Let refer to the number of stages considered by the current value iteration. During the first iteration, , which means that all one-stage trajectories are considered. Let be the set of samples (assuming already that none lie in ). Let and refer to the dead and alive samples, respectively. Initially, , the set of samples in the goal set. The first set, , of alive samples is assigned by using the concept of a frontier. The frontier of a set of sample points is
Now the approach is described for iteration . The cost-to-go update (8.56) is computed at all points in . If for some , then is declared dead and moved to . Samples are never removed from the dead set; therefore, all points in are also added to . The next active set, , includes all samples in , excluding those that were moved to the dead set. Furthermore, all samples in are added to because these will produce a finite cost-to-go value in the next iteration. The iterations continue as usual until some stage, , is reached for which is empty, and includes all samples from which the goal can be reached (under the approximation assumptions made in this section).
For efficiency purposes, an approximation to may be used, provided that the true frontier is a proper subset of the approximate frontier. For example, the frontier might add all new samples within a specified radius of points in . In this case, the updated cost-to-go value for some may remain infinite. If this occurs, it is of course not added to . Furthermore, it is deleted from in the computation of the next frontier (the frontier should only be computed for samples that have finite cost-to-go values).
The approach considered so far can be expected to reduce the amount of computations in each value iteration by eliminating the evaluation of (8.56) on unnecessary samples. The same cost-to-go values are obtained in each iteration because only samples for which the value cannot change are eliminated in each iteration. The resulting algorithm still does not, however, resemble Dijkstra's algorithm because value iterations are performed over all of in each stage.
To make a version that behaves like Dijkstra's algorithm, a queue will be introduced. The algorithm removes the smallest element of in each iteration. The interpolation version first assigns for each . It also maintains a set of dead samples, which is initialized to . For each , the cost-to-go update (8.56) is computed. The priority is initialized to , and elements are sorted by their current cost-to-go values (which may not be optimal). The algorithm iteratively removes the smallest element from (because its optimal cost-to-go is known to be the current value) and terminates when is empty. Each time the smallest element, , is removed, it is inserted into . Two procedures are then performed: 1) Some elements in the queue need to have their cost-to-go values recomputed using (8.56) because the value is known to be optimal, and their values may depend on it. These are the samples in that lie in (in which just got extended to include ). 2) Any samples in that are not in are inserted into after computing their cost-to-go values using (8.56). This enables the active set of samples to grow as the set of dead samples grows. Dijkstra's algorithm with interpolation does not compute values that are identical to those produced by value iterations because is not explicitly stored when is computed. Each computed value is some cost-to-go, but it is only known to be the optimal when the sample is placed into . It can be shown, however, that the method converges because computed values are no higher than what would have been computed in a value iteration. This is also the basis of dynamic programming using Gauss-Seidel iterations [96].
A specialized, wavefront-propagation version of the algorithm can be made for the special case of finding solutions that reach the goal in the smallest number of stages. The algorithm is similar to the one shown in Figure 8.4. It starts with an initial wavefront in which for each . In each iteration, the optimal cost-to-go value is increased by one, and the wavefront, , is computed from as . The algorithm terminates at the first iteration in which the wavefront is empty.
Steven M LaValle 2020-08-14