Up to this point, there has been no reason to prefer one action over any other in the search. Section 2.3 will formalize optimal discrete planning and will present several algorithms that find optimal plans. Before going into that, we present a systematic search algorithm that finds optimal plans because it is also useful for finding feasible plans. The result is the well-known Dijkstra's algorithm for finding single-source shortest paths in a graph [273], which is a special form of dynamic programming. More general dynamic programming computations appear in Section 2.3 and throughout the book.
Suppose that every edge, , in the graph representation of a
discrete planning problem has an associated nonnegative cost
,
which is the cost to apply the action. The cost
could be
written using the state-space representation as
, indicating
that it costs
to apply action
from state
. The total
cost of a plan is just the sum of the edge costs over the path from
the initial state to a goal state.
The priority queue, , will be sorted according to a function
, called the cost-to-come. For
each state
, the value
is called
the optimal2.1 cost-to-come from the initial state
. This optimal cost is obtained by summing edge costs,
, over all possible paths from
to
and using the
path that produces the least cumulative cost. If the cost is not
known to be optimal, then it is written as
.
The cost-to-come is computed incrementally during the execution of the
search algorithm in Figure 2.4. Initially,
. Each time the state
is generated, a cost is computed as
, in which
is the edge from
to
(equivalently, we may write
).
Here,
represents the best cost-to-come that is known so
far, but we do not write
because it is not yet known whether
was reached optimally. Due to this, some work is required in
line 12. If
already exists in
, then it is possible that
the newly discovered path to
is more efficient. If so, then the
cost-to-come value
must be lowered for
, and
must be reordered accordingly.
When does finally become
for some state
?
Once
is removed from
using
, the state
becomes dead, and it is known that
cannot be reached with a lower
cost. This can be argued by induction. For the initial state,
is known, and this serves as the base case. Now
assume that every dead state has its optimal
cost-to-come correctly determined. This means that their cost-to-come
values can no longer change. For the first element,
, of
,
the value must be optimal because any path that has a lower total cost
would have to travel through another state in
, but these
states already have higher costs. All paths that pass only through
dead states were already considered in producing
.
Once all edges leaving
are explored, then
can be declared as
dead, and the induction continues. This is not enough detail to
constitute a proof of optimality; more arguments appear in Section
2.3.3 and in [243]. The running time is
, in which
and
are the numbers of
edges and vertices, respectively, in the graph representation of the
discrete planning problem. This assumes that the priority queue is
implemented with a Fibonacci heap, and that all other operations, such
as determining whether a state has been visited, are performed in
constant time. If other data structures are used to implement the
priority queue, then higher running times may be obtained.
Steven M LaValle 2020-08-14