8.2.1 Defining a Feedback Plan

Consider a discrete planning problem similar to the ones defined in
Formulations 2.1 and 2.3, except that the initial
state is not given. Due to this, the cost functional cannot be
expressed only as a function of a plan. It is instead defined in
terms of the *state history* and *action history*. At stage
, these are defined as

(8.1) |

and

(8.2) |

respectively. Sometimes, it will be convenient to alternatively refer to as the

The resulting formulation is

- A finite, nonempty
*state space*. - For each state, , a finite
*action space*. - A
*state transition function*that produces a state, , for every and . Let denote the union of for all . - A set of
*stages*, each denoted by , that begins at and continues indefinitely. - A
*goal set*, . - Let denote a stage-additive
*cost functional*,

in which .

Consider defining a plan that solves Formulation 8.1. If the initial condition is given, then a sequence of actions could be specified, as in Chapter 2. Without having the initial condition, one possible approach is to determine a sequence of actions for each possible initial state, . Once the initial state is given, the appropriate action sequence is known. This approach, however, wastes memory. Suppose some is given as the initial state and the first action is applied, leading to the next state . What action should be applied from ? The second action in the sequence at can be used; however, we can also imagine that is now the initial state and use its first action. This implies that keeping an action sequence for every state is highly redundant. It is sufficient at each state to keep only the first action in the sequence. The application of that action produces the next state, at which the next appropriate action is stored. An execution sequence can be imagined from an initial state as follows. Start at some state, apply the action stored there, arrive at another state, apply its action, arrive at the next state, and so on, until the goal is reached.

It therefore seems appropriate to represent a feedback plan as a
function that maps every state to an action. Therefore, a
*feedback plan* is defined as a function
. From every state, , the plan indicates which
action to apply. If the goal is reached, then the termination action
should be applied. This is specified as part of the plan:
, if
. A feedback plan is called a *solution*
to the problem if it causes the goal to be reached from every state
that is reachable from the goal.

If an initial state and a feedback plan are given, then the state and action histories can be determined. This implies that the execution cost, (8.3), also can be determined. It can therefore be alternatively expressed as , instead of . This relies on future states always being predictable. In Chapter 10, it will not be possible to make this direct correspondence due to uncertainties (see Section 10.1.3).