Consider computing an optimal plan under Formulation 2.2.
One could naively generate all length- sequences of actions and
select the sequence that produces the best cost, but this would
require
running time (imagine
nested loops, one for
each stage), which is clearly prohibitive. Luckily, the dynamic
programming principle helps. We first say in words what will appear
later in equations. The main observation is that portions of optimal
plans are themselves optimal. It would be absurd to be able to
replace a portion of an optimal plan with a portion that produces
lower total cost; this contradicts the optimality of the original
plan.
The principle of optimality leads directly to an iterative algorithm, called value iteration,2.3 that can solve a vast collection of optimal planning problems, including those that involve variable-length plans, stochastic uncertainties, imperfect state measurements, and many other complications. The idea is to iteratively compute optimal cost-to-go (or cost-to-come) functions over the state space. In some cases, the approach can be reduced to Dijkstra's algorithm; however, this only occurs under some special conditions. The value-iteration algorithm will be presented next, and Section 2.3.3 discusses its connection to Dijkstra's algorithm.