The formulation presented in this section is an extension of
Formulation 2.3 that incorporates the effects of nature at
every stage. Let denote a discrete state space, and let
denote the set of actions available to the decision maker (or robot)
from state
. At each stage
it is assumed that a nature action
is chosen from a set
.
This can be considered as a multi-stage generalization of Formulation
9.4, which introduced
. Now
may
depend on the state in addition to the action because both
and
are available in the current setting. This implies that nature
acts with the knowledge of the action selected by the decision maker.
It is always assumed that during stage
, the decision maker does
not know the particular nature action that will be chosen. It does,
however, know the set
for all
and
.
As in Section 9.2, there are two alternative nature
models: nondeterministic or probabilistic. If the nondeterministic
model is used, then it is only known that nature will make a choice
from
. In this case, making decisions using
worst-case analysis is appropriate.
If the probabilistic model is used, then a probability distribution
over
is specified as part of the model. The most
important assumption to keep in mind for this case is that nature is
Markovian. In general, this means that the probability depends
only on local information. In most applications, this locality is
with respect to time. In our formulation, it means that the
distribution over
depends only on information
obtained at the current stage. In other settings, Markovian could
mean a dependency on a small number of stages, or even a local
dependency in terms of spatial relationships, as in a Markov
random field [231,377].
To make the Markov assumption more precise, the state and action histories as defined in Section 8.2.1 will be used again here. Let
![]() |
(10.1) |
![]() |
(10.2) |
![]() |
(10.3) |
The effect of nature is defined in the state transition equation,
which produces a new state, , once
,
, and
are given:
In the probabilistic case, a probability distribution over can be
derived for stage
, under the application of
from
.
As part of the problem,
is given. Using the
state transition equation,
,
![]() |
(10.7) |
Putting these parts of the model together and adding some of the components from Formulation 2.3, leads to the following formulation:
Using Formulation 10.1, either a feasible or optimal
planning problem can be defined. To obtain a feasible planning
problem, let
for all
,
, and
. Furthermore, let
![]() |
(10.9) |
Note that in each stage, the cost term is generally allowed to depend
on the nature action . If probabilistic uncertainty is
used, then Formulation 10.1 is often referred to as a
controlled Markov process or Markov decision process
(MDP). If the actions are removed from the formulation, then it is
simply referred to as a Markov process. In most statistical
literature, the name Markov chain is used instead of Markov
process when there are discrete stages (as opposed to continuous-time
Markov processes). Thus, the terms controlled Markov chain and
Markov decision chain may be preferable.
In some applications, it may be convenient to avoid the explicit
characterization of nature. Suppose that
. If nondeterministic uncertainty is used, then
can be specified for all
and
as a substitute for the state transition equation; this avoids
having to refer to nature. The application of an action
from a
state
directly leads to a specified subset of
. If
probabilistic uncertainty is used, then
can be
directly defined as the alternative to the state transition equation.
This yields a probability distribution over
, if
is applied
from some
, once again avoiding explicit reference to nature.
Most of the time we will use a state transition equation that refers
to nature; however, it is important to keep these alternatives in
mind. They arise in many related books and research articles.
As used throughout Chapter 2, a directed state
transition graph is sometimes convenient for expressing the planning
problem. The same idea can be applied in the current setting. As in
Section 2.1, is the vertex set; however, the edge
definition must change to reflect nature. A directed edge exists from
state
to
if there exists some
and
such that
. A weighted
graph can be made by associating the cost term
with each edge. In the case of a probabilistic model, the probability
of the transition occurring may also be associated with each edge.
Note that both the decision maker and nature are needed to determine
which vertex will be reached. As the decision maker contemplates
applying an action from the state
, it sees that there may be
several outgoing edges due to nature. If a different action is
contemplated, then this set of possible outgoing edges changes. Once
nature applies its action, then the particular edge is traversed to
arrive at the new state; however, this is not completely controlled by
the decision maker.
Consider executing a sequence of actions,
, under
the nondeterministic uncertainty model. This means that we attempt to
move left two units in each stage. After the first
is applied,
the set of possible next states is
. Nature may slow
down the progress to be only one unit per stage, or it may speed up
the progress so that
is three units closer per stage. Note
that after
stages, the goal is guaranteed to be achieved, in
spite of any possible actions of nature. Once
is reached,
should be applied. If the problem is changed so that
, it becomes impossible to guarantee that the goal will be
reached because nature may cause the goal to be overshot.
Now let
and
. Under
nondeterministic uncertainty, the problem can no longer be solved
because nature is now powerful enough to move the state completely in
the wrong direction in the worst case. A reasonable probabilistic
version of the problem can, however, be defined and solved. Suppose
that
for each
. The transition
probabilities can be defined from
. For example, if
and
, then
if
, and
otherwise. With the
probabilistic formulation, there is a nonzero probability that the
goal,
, will be reached, even though in the
worst-case reaching the goal is not guaranteed.
Let
be a set of actions, which denote ``stay,''
``right,'' ``up,'' ``left,'' and ``down,'' respectively. Let
. For each
, let
contain
and whichever
actions are applicable from
(some are not applicable along the
boundaries).
Let
represent the set of all actions in
that are
applicable after performing the move implied by
. For example, if
and
, then the robot is attempting to move to
. From this state, there are three neighboring states, each of
which corresponds to an action of nature. Thus,
in this
case is
. The action
does not appear
because there is no state to the left of
. Suppose that the
probabilistic model is used, and that every nature action is equally
likely.
The state transition function is formed by adding the effect of
both
and
. For example, if
,
,
and
, then
. If
had
been
, then the two actions would cancel and
.
Without nature, it would have been assumed that
. As
always, the state never changes once
is applied, regardless of
nature's actions.
For the cost functional, let
(unless
; in this case,
). For the final stage, let
if
; otherwise, let
. A reasonable task is to get the robot to terminate in
in the minimum expected number of stages. A feedback plan is
needed, which will be introduced in Section 10.1.3, and the
optimal plan for this problem can be efficiently computed using the
methods of Section 10.2.1.
This example can be easily generalized to moving through a complicated
labyrinth in two or more dimensions. If the grid resolution is high,
then an approximation to motion planning is obtained. Rather than
forcing motions in only four directions, it may be preferable to allow
any direction. This case is covered in Section 10.6,
which addresses planning in continuous state spaces.
Steven M LaValle 2020-08-14