Imagine exploring an unknown planet using a robotic vehicle. The robot moves along the rugged terrain while using a range scanner to make precise measurements of the ground in its vicinity. As the robot moves, it may discover that some parts were easier to traverse than it originally thought. In other cases, it might realize that some direction it was intending to go is impassable due to a large bolder or a ravine. If the goal is to arrive at some specified coordinates, this problem can be viewed as a navigation problem in an unknown environment. The resulting solution is a lazy approach, as discussed in Section 12.2.1.
This section presents Stentz's algorithm [913], which has been used in many outdoor vehicle navigation applications, such as the vehicle shown in Figure 12.17. The algorithm can be considered as a dynamic version of the backward variant of Dijkstra's algorithm. Thus, it maintains cost-to-go values, and the search grows outward from the goal, as opposed to cost-to-come values from in the version of Dijkstra's algorithm in Section 2.3.3. The method applies to any optimal planning problem. In terms of the state transition graph, it is assumed that the costs of edge transitions are unknown (equivalently, each cost is unknown). In the navigation problem, a positive cost indicates the difficulty of traveling from state to state .
To work with a concrete problem, imagine that a planet surface is partitioned into a high-resolution grid. The state space is simply a bounded set of grid tiles; hence, . Each grid tile is assigned a positive, real value, , that indicates the difficulty of its traversal. The actions at each grid point can be chosen using standard grid neighbors (e.g., four-neighbors or eight-neighbors). This now defines a state transition graph over . From any and such that , the cost term is assigned using as . This model is a generalization of the grid in Section 12.3.1, in which the tiles were either empty or occupied; here any positive real value is allowed. In the coming explanation, the costs may be more general than what is permitted by starting from , and the state transition graph does not need to be derived from a grid. Some initial values are assigned arbitrarily for all . For example, in the planetary exploration application, the cost of traversing a level, unobstructed surface may be uniformly assumed.
The task is to navigate to some goal state, . The method works by initially constructing a feedback plan, , on a subset of that includes both and . The plan, , is computed by iteratively applying the procedure in Figure 12.18 until the optimal cost-to-go is known at . A priority queue, , is maintained as in Dijkstra's algorithm; however, Stentz's algorithm allows the costs of elements in to be modified due to information sensed during execution. Let denote the lowest cost-to-go associated with during the time it spends in . Assume that is sorted according to . Let denote its current cost-to-go value, which may actually be more than if some cost updates caused it to increase. Suppose that some can be applied to reach a state . Let denote the cost-to-go from by traveling via ,
(12.24) |
After the iterations of Figure 12.18 finish, the robot executes , which generates a sequence of visited states. Let denote the current state during execution. If it is discovered that if would be applied, the received cost would not match the cost in the current model, then the costs need to be updated. More generally, the robot may have to be able to update costs within a region around that corresponds to the sensor field of view. For the description below, assume that an update, , is obtained for only (the more general case is handled similarly). First, is updated to the newly measured value. If happened to be dead (visited, but no longer in ), then it is inserted again into , with cost . The steps in Figure 12.18 are performed until for all . Following this, the plan execution continues until either the goal is reached or another cost mismatch is discovered. At any time during execution, the robot motions are optimal given the current information about the costs [913].
Figure 12.19 illustrates the execution of the algorithm. Figure 12.19a shows a synthetic terrain that was generated by a stochastic fractal. Darker gray values indicate higher cost. In the center, very costly terrain acts as a barrier, for which an escape route exists in the downward direction. The initial state is the middle of the left edge of the environment, and the goal state is the right edge. The robot initially plans a straight-line path and then incrementally updates the path in each step as it moves. In Figure 12.19b, the robot has encountered the costly center and begins to search for a way around. Finally, the goal is reached, as shown in Figure 12.19c. The executed path is actually the result of executing a series of optimal paths, each of which is based on the known information at the time a single action is applied.