Defining a plan

Let a (feedback) plan for Formulation 10.1 be defined as a function $\pi : X \rightarrow U$ that produces an action $\pi (x) \in U(x)$ , for each $x \in X$ . Although the future state may not be known due to nature, if $\pi$ is given, then it will at least be known what action will be taken from any future state. In other works, $\pi$ has been called a feedback policy, feedback control law, reactive plan [340], and conditional plan.

For some problems, particularly when is fixed at some finite value, a stage-dependent plan may be necessary. This enables a different action to be chosen for every stage, even from the same state. Let ${\cal K}$ denote the set $\{1,\ldots,K\}$ of stages. A stage-dependent plan is defined as $\pi : X \times {\cal K} \rightarrow U$ . Thus, an action is given by $u = \pi (x,k)$ . Note that the definition of a -step plan, which was given Section 2.3, is a special case of the current definition. In that setting, the action depended only on the stage because future states were always predictable. Here they are no longer predictable and must be included in the domain of $\pi$ . Unless otherwise mentioned, it will be assumed by default that $\pi$ is not stage-dependent.

Note that once $\pi$ is formulated, the state transitions appear to be a function of only the current state and nature. The next state is given by $f(x,\pi (x),\theta)$ . The same is true for the cost term, $l(x,\pi (x),\theta)$ .

Steven M LaValle 2020-08-14