11.1.3 Defining a Planning Problem

Planning problems will be defined directly on the history I-space, which makes it appear as an ordinary state space in many ways. Keep in mind, however, that it was derived from another state space for which perfect state observations could not be obtained. In Section 10.1, a feedback plan was defined as a function of the state. Here, a feedback plan is instead a function of the I-state. Decisions cannot be based on the state because it will be generally unknown during the execution of the plan. However, the I-state is always known; thus, it is logical to base decisions on it.

Let $ \pi _K$ denote a $ K$-step information-feedback plan, which is a sequence $ (\pi _1$, $ \pi_2$, $ \ldots $, $ \pi _K$) of $ K$ functions, $ \pi _k: {{\cal I}_k}\rightarrow U$. Thus, at every stage $ k$, the I-state $ {\eta}_k \in {{\cal I}_k}$ is used as a basis for choosing the action $ u_k = \pi _k({\eta}_k)$. Due to interference of nature through both the state transition equation and the sensor mapping, the action sequence $ (u_1,\ldots,u_K)$ produced by a plan, $ \pi _K$, will not be known until the plan terminates.

As in Formulation 2.3, it will be convenient to assume that $ U$ contains a termination action, $ u_T$. If $ u_T$ is applied at stage $ k$, then it is repeatedly applied forever. It is assumed once again that the state $ x_k$ remains fixed after the termination condition is applied. Remember, however, $ x_k$ is still unknown in general; it becomes fixed but unknown. Technically, based on the definition of the history I-space, the I-state must change after $ u_T$ is applied because the history grows. These changes can be ignored, however, because no new decisions are made after $ u_T$ is applied. A plan that uses a termination condition can be specified as $ \pi =
(\pi _1,\pi _2,\ldots)$ because the number of stages may vary each time the plan is executed. Using the history I-space definition in (11.19), an information-feedback plan is expressed as

$\displaystyle \pi : {\cal I}_{hist}\rightarrow U .$ (11.20)

We are almost ready to define the planning problem. This will require the specification of a cost functional. The cost depends on the histories of states $ {\tilde{x}}$ and actions $ {\tilde{u}}$ as in Section 10.1. The planning formulation involves the following components, summarizing most of the concepts introduced so far in Section 11.1 (see Formulation 10.1 for similarities):

Formulation 11..1 (Discrete Information Space Planning)  
  1. A nonempty state space $ X$ that is either finite or countably infinite.
  2. A nonempty, finite action space $ U$. It is assumed that $ U$ contains a special termination action, which has the same effect as defined in Formulation 2.3.
  3. A finite nature action space $ \Theta(x,u)$ for each $ x \in X$ and $ u \in U$.
  4. A state transition function $ f$ that produces a state, $ f(x,u,\theta)$, for every $ x \in X$, $ u \in U$, and $ \theta \in
\Theta(x,u)$.
  5. A finite or countably infinite observation space $ Y$.
  6. A finite nature sensing action space $ \Psi(x)$ for each $ x \in X$.
  7. A sensor mapping $ h$ which produces an observation, $ y=h(x,\psi)$, for each $ x \in X$ and $ \psi \in \Psi(x)$. This definition assumes a state-nature sensor mappings. A state sensor mapping or history-based sensor mapping, as defined in Section 11.1.1, could alternatively be used.
  8. A set of stages, each denoted by $ k$, which begins at $ k=1$ and continues indefinitely.
  9. An initial condition $ {\eta_0}$, which is an element of an initial condition space, $ {{\cal I}_0}$.
  10. A history I-space $ {\cal I}_{hist}$ which is the union of $ {\cal I}_0$ and $ {{\cal I}_k}= {{\cal I}_0}\times {\tilde{U}}_{k-1} \times {\tilde{Y}}_k$ for every stage $ k \in {\mathbb{N}}$.
  11. Let $ L$ denote a stage-additive cost functional, which may be applied to any pair $ ({\tilde{x}}_{K+1},{\tilde{u}}_K)$ of state and action histories to yield

    $\displaystyle L({\tilde{x}}_{K+1},{\tilde{u}}_K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_{K+1}) .$ (11.21)

    If the termination action $ u_T$ is applied at some stage $ k$, then for all $ i \geq k$, $ u_i = u_T$, $ x_i = x_k$, and $ l(x_i,u_T) =
0$. Either a feasible or optimal planning problem can be defined, as in Formulation 10.1; however, the plan here is specified as $ \pi : {\cal I}\rightarrow U$.

A goal set may be defined as $ X_G \subset X$. Alternatively, the goal could be expressed as a desirable set of history I-states. After Section 11.2, it will be seen that the goal can be expressed in terms of I-states that are derived from histories.

Some immediate extensions of Formulation 11.1 are possible, but we avoid them here simplify notation in the coming concepts. One extension is to allow different action sets, $ U(x)$, for each $ x \in X$. Be careful, however, because information regarding the current state can be inferred if the action set $ U(x)$ is given, and it varies depending on $ x$. Another extension is to allow the costs to depend on nature, to obtain $ l(x_k,u_k,\theta_k)$, instead of $ l(x_k,u_k)$ in (11.21).



Subsections
Steven M LaValle 2020-08-14