8.2.1 Defining a Feedback Plan

Consider a discrete planning problem similar to the ones defined in Formulations 2.1 and 2.3, except that the initial state is not given. Due to this, the cost functional cannot be expressed only as a function of a plan. It is instead defined in terms of the state history and action history. At stage $ k$, these are defined as

$\displaystyle {\tilde{x}}_k = (x_1,x_2,\ldots,x_k)$ (8.1)

and

$\displaystyle {\tilde{u}}_k = (u_1,u_2,\ldots,u_k) ,$ (8.2)

respectively. Sometimes, it will be convenient to alternatively refer to $ {\tilde{x}}_k$ as the state trajectory.

The resulting formulation is

Formulation 8..1 (Discrete Optimal Feedback Planning)  
  1. A finite, nonempty state space $ X$.
  2. For each state, $ x \in X$, a finite action space $ U(x)$.
  3. A state transition function $ f$ that produces a state, $ f(x,u) \in X$, for every $ x \in X$ and $ u \in U(x)$. Let $ U$ denote the union of $ U(x)$ for all $ x \in X$.
  4. A set of stages, each denoted by $ k$, that begins at $ k=1$ and continues indefinitely.
  5. A goal set, $ {X_{G}}\subset X$.
  6. Let $ L$ denote a stage-additive cost functional,

    $\displaystyle L({\tilde{x}}_F,{\tilde{u}}_K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_F) ,$ (8.3)

    in which $ F = K+1$.

There is one other difference in comparison to the formulations of Chapter 2. The state space is assumed here to be finite. This facilitates the construction of a feedback plan, but it is not necessary in general.

Consider defining a plan that solves Formulation 8.1. If the initial condition is given, then a sequence of actions could be specified, as in Chapter 2. Without having the initial condition, one possible approach is to determine a sequence of actions for each possible initial state, $ x_1 \in X$. Once the initial state is given, the appropriate action sequence is known. This approach, however, wastes memory. Suppose some $ x$ is given as the initial state and the first action is applied, leading to the next state $ x'$. What action should be applied from $ x'$? The second action in the sequence at $ x$ can be used; however, we can also imagine that $ x'$ is now the initial state and use its first action. This implies that keeping an action sequence for every state is highly redundant. It is sufficient at each state to keep only the first action in the sequence. The application of that action produces the next state, at which the next appropriate action is stored. An execution sequence can be imagined from an initial state as follows. Start at some state, apply the action stored there, arrive at another state, apply its action, arrive at the next state, and so on, until the goal is reached.

It therefore seems appropriate to represent a feedback plan as a function that maps every state to an action. Therefore, a feedback plan $ \pi $ is defined as a function $ \pi : X
\rightarrow U$. From every state, $ x \in X$, the plan indicates which action to apply. If the goal is reached, then the termination action should be applied. This is specified as part of the plan: $ \pi (x)
= u_T$, if $ x \in {X_{G}}$. A feedback plan is called a solution to the problem if it causes the goal to be reached from every state that is reachable from the goal.

If an initial state $ x_1$ and a feedback plan $ \pi $ are given, then the state and action histories can be determined. This implies that the execution cost, (8.3), also can be determined. It can therefore be alternatively expressed as $ L(\pi ,x_1)$, instead of $ L({\tilde{x}}_F,{\tilde{u}}_K)$. This relies on future states always being predictable. In Chapter 10, it will not be possible to make this direct correspondence due to uncertainties (see Section 10.1.3).



Subsections
Steven M LaValle 2020-08-14