8.2.1 Defining a Feedback Plan

Consider a discrete planning problem similar to the ones defined in Formulations 2.1 and 2.3, except that the initial state is not given. Due to this, the cost functional cannot be expressed only as a function of a plan. It is instead defined in terms of the state history and action history. At stage

, these are defined as

Formulation 8..1 (Discrete Optimal Feedback Planning)

A finite, nonempty state space .
For each state, $x \in X$ , a finite action space .
A state transition function that produces a state, $f(x,u) \in X$ , for every $x \in X$ and $u \in U(x)$ . Let denote the union of for all $x \in X$ .
A set of stages, each denoted by , that begins at and continues indefinitely.
A goal set, ${X_{G}}\subset X$ .
Let denote a stage-additive cost functional,

$\displaystyle L({\tilde{x}}_F,{\tilde{u}}_K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_F) ,$ (8.3)

in which .

Consider defining a plan that solves Formulation 8.1. If the initial condition is given, then a sequence of actions could be specified, as in Chapter 2. Without having the initial condition, one possible approach is to determine a sequence of actions for each possible initial state, $x_1 \in X$ . Once the initial state is given, the appropriate action sequence is known. This approach, however, wastes memory. Suppose some

is given as the initial state and the first action is applied, leading to the next state

. What action should be applied from

? The second action in the sequence at

can be used; however, we can also imagine that

is now the initial state and use its first action. This implies that keeping an action sequence for every state is highly redundant. It is sufficient at each state to keep only the first action in the sequence. The application of that action produces the next state, at which the next appropriate action is stored. An execution sequence can be imagined from an initial state as follows. Start at some state, apply the action stored there, arrive at another state, apply its action, arrive at the next state, and so on, until the goal is reached.

It therefore seems appropriate to represent a feedback plan as a function that maps every state to an action. Therefore, a feedback plan $\pi$ is defined as a function $\pi : X \rightarrow U$ . From every state, $x \in X$ , the plan indicates which action to apply. If the goal is reached, then the termination action should be applied. This is specified as part of the plan: $\pi (x) = u_T$ , if $x \in {X_{G}}$ . A feedback plan is called a solution to the problem if it causes the goal to be reached from every state that is reachable from the goal.

If an initial state

and a feedback plan $\pi$ are given, then the state and action histories can be determined. This implies that the execution cost, (8.3), also can be determined. It can therefore be alternatively expressed as $L(\pi ,x_1)$ , instead of $L({\tilde{x}}_F,{\tilde{u}}_K)$ . This relies on future states always being predictable. In Chapter 10, it will not be possible to make this direct correspondence due to uncertainties (see Section 10.1.3).