10.1.1 Model Definition

The formulation presented in this section is an extension of
Formulation 2.3 that incorporates the effects of nature at
every stage. Let denote a discrete state space, and let
denote the set of actions available to the decision maker (or robot)
from state . At each stage it is assumed that a *nature action* is chosen from a set
.
This can be considered as a multi-stage generalization of Formulation
9.4, which introduced . Now may
depend on the state in addition to the action because both and
are available in the current setting. This implies that nature
acts with the knowledge of the action selected by the decision maker.
It is always assumed that during stage , the decision maker does
not know the particular nature action that will be chosen. It does,
however, know the set
for all and
.

As in Section 9.2, there are two alternative nature models: nondeterministic or probabilistic. If the nondeterministic model is used, then it is only known that nature will make a choice from . In this case, making decisions using worst-case analysis is appropriate.

If the probabilistic model is used, then a probability distribution
over
is specified as part of the model. The most
important assumption to keep in mind for this case is that nature is
*Markovian*. In general, this means that the probability depends
only on local information. In most applications, this locality is
with respect to time. In our formulation, it means that the
distribution over
depends only on information
obtained at the current stage. In other settings, Markovian could
mean a dependency on a small number of stages, or even a local
dependency in terms of spatial relationships, as in a *Markov
random field* [231,377].

To make the Markov assumption more precise, the state and action histories as defined in Section 8.2.1 will be used again here. Let

(10.1) |

and

(10.2) |

These represent all information that is available up to stage . Without the Markov assumption, it could be possible that the probability distribution for nature is conditioned on all of and , to obtain . The Markov assumption declares that for all ,

(10.3) |

which drops all history except the current state and action. Once these two are known, there is no extra information regarding the nature action that could be gained from any portion of the histories.

The effect of nature is defined in the state transition equation, which produces a new state, , once , , and are given:

From the perspective of the decision maker, is not given. Therefore, it can only infer that a particular set of states will result from applying and :

In (10.5), the notation indicates a set of possible values for , given and . The notation will generally be used to indicate the possible values for that can be derived using the information that appears in the argument.

In the probabilistic case, a probability distribution over can be derived for stage , under the application of from . As part of the problem, is given. Using the state transition equation, ,

can be derived, in which

(10.7) |

The calculation of simply involves accumulating all of the probability mass that could lead to from the application of various nature actions.

Putting these parts of the model together and adding some of the components from Formulation 2.3, leads to the following formulation:

- A nonempty
*state space*which is a finite or countably infinite set of*states*. - For each state, , a finite, nonempty
*action space*. It is assumed that contains a special*termination action*, which has the same effect as the one defined in Formulation 2.3. - A finite, nonempty
*nature action space*for each and . - A
*state transition function*that produces a state, , for every , , and . - A set of
*stages*, each denoted by , that begins at and continues indefinitely. Alternatively, there may be a fixed, maximum stage . - An
*initial state*. For some problems, this may not be specified, in which case a solution plan must be found from all initial states. - A
*goal set*. - A stage-additive cost functional .
Let
denote the history of nature actions up to stage
. The cost functional may be applied to any combination of state,
action, and nature histories to yield

in which . If the termination action is applied at some stage , then for all , , , and .

Using Formulation 10.1, either a feasible or optimal planning problem can be defined. To obtain a feasible planning problem, let for all , , and . Furthermore, let

(10.9) |

To obtain an optimal planning problem, in general may assume any nonnegative, finite value if . For problems that involve probabilistic uncertainty, it is sometimes appropriate to assign a high, finite value for if , as opposed to assigning an infinite cost for failing to achieve the goal.

Note that in each stage, the cost term is generally allowed to depend
on the nature action . If probabilistic uncertainty is
used, then Formulation 10.1 is often referred to as a
*controlled Markov process* or *Markov decision process*
(MDP). If the actions are removed from the formulation, then it is
simply referred to as a *Markov process*. In most statistical
literature, the name *Markov chain* is used instead of *Markov
process* when there are discrete stages (as opposed to continuous-time
Markov processes). Thus, the terms *controlled Markov chain* and
*Markov decision chain* may be preferable.

In some applications, it may be convenient to avoid the explicit characterization of nature. Suppose that . If nondeterministic uncertainty is used, then can be specified for all and as a substitute for the state transition equation; this avoids having to refer to nature. The application of an action from a state directly leads to a specified subset of . If probabilistic uncertainty is used, then can be directly defined as the alternative to the state transition equation. This yields a probability distribution over , if is applied from some , once again avoiding explicit reference to nature. Most of the time we will use a state transition equation that refers to nature; however, it is important to keep these alternatives in mind. They arise in many related books and research articles.

As used throughout Chapter 2, a directed state transition graph is sometimes convenient for expressing the planning problem. The same idea can be applied in the current setting. As in Section 2.1, is the vertex set; however, the edge definition must change to reflect nature. A directed edge exists from state to if there exists some and such that . A weighted graph can be made by associating the cost term with each edge. In the case of a probabilistic model, the probability of the transition occurring may also be associated with each edge.

Note that both the decision maker and nature are needed to determine which vertex will be reached. As the decision maker contemplates applying an action from the state , it sees that there may be several outgoing edges due to nature. If a different action is contemplated, then this set of possible outgoing edges changes. Once nature applies its action, then the particular edge is traversed to arrive at the new state; however, this is not completely controlled by the decision maker.

Consider executing a sequence of actions, , under the nondeterministic uncertainty model. This means that we attempt to move left two units in each stage. After the first is applied, the set of possible next states is . Nature may slow down the progress to be only one unit per stage, or it may speed up the progress so that is three units closer per stage. Note that after stages, the goal is guaranteed to be achieved, in spite of any possible actions of nature. Once is reached, should be applied. If the problem is changed so that , it becomes impossible to guarantee that the goal will be reached because nature may cause the goal to be overshot.

Now let
and
. Under
nondeterministic uncertainty, the problem can no longer be solved
because nature is now powerful enough to move the state completely in
the wrong direction in the worst case. A reasonable probabilistic
version of the problem can, however, be defined and solved. Suppose
that
for each
. The transition
probabilities can be defined from . For example, if and , then
if
, and
otherwise. With the
probabilistic formulation, there is a nonzero probability that the
goal,
, will be reached, even though in the
worst-case reaching the goal is not guaranteed.

Let be a set of actions, which denote ``stay,'' ``right,'' ``up,'' ``left,'' and ``down,'' respectively. Let . For each , let contain and whichever actions are applicable from (some are not applicable along the boundaries).

Let represent the set of all actions in that are applicable after performing the move implied by . For example, if and , then the robot is attempting to move to . From this state, there are three neighboring states, each of which corresponds to an action of nature. Thus, in this case is . The action does not appear because there is no state to the left of . Suppose that the probabilistic model is used, and that every nature action is equally likely.

The state transition function is formed by adding the effect of both and . For example, if , , and , then . If had been , then the two actions would cancel and . Without nature, it would have been assumed that . As always, the state never changes once is applied, regardless of nature's actions.

For the cost functional, let (unless ; in this case, ). For the final stage, let if ; otherwise, let . A reasonable task is to get the robot to terminate in in the minimum expected number of stages. A feedback plan is needed, which will be introduced in Section 10.1.3, and the optimal plan for this problem can be efficiently computed using the methods of Section 10.2.1.

This example can be easily generalized to moving through a complicated
labyrinth in two or more dimensions. If the grid resolution is high,
then an approximation to motion planning is obtained. Rather than
forcing motions in only four directions, it may be preferable to allow
any direction. This case is covered in Section 10.6,
which addresses planning in continuous state spaces.

Steven M LaValle 2020-08-14