Imagine assigning reward values to various outcomes of a decision-making process. In some applications numerical values may come naturally. For example, the reward might be the amount of money earned in a financial investment. In robotics applications, one could negate time to execute a task or the amount of energy consumed. For example, the reward could indicate the amount of remaining battery life after a mobile robot builds a map.
In some applications the source of rewards may be subjective. For example, what is the reward for washing dishes, in comparison to sweeping the floor? Each person would probably assign different rewards, which may even vary from day to day. It may be based on their enjoyment or misery in performing the task, the amount of time each task would take, the perceptions of others, and so on. If decision theory is used to automate the decision process for a human ``client,'' then it is best to consult carefully with the client to make sure you know their preferences. In this situation, it may be possible to sort their preferences and then assign rewards that are consistent with the ordering.
Once the rewards are assigned, consider making a decision under
Formulation 9.1, which does not involve nature. Each
outcome corresponds directly to an action, . If the rewards
are given by
, then the cost,
, can be
defined as
for every
. Satisfying the client
is then a matter of choosing
to minimize
.
Now consider a game against nature. The decision now involves comparing probability distributions over the outcomes. The space of all probability distributions may be enormous, but this is simplified by using expectation to map each probability distribution (or density) to a real value. The concern should be whether this projection of distributions onto real numbers will fail to reflect the true preferences of the client. The following example illustrates the effect of this.
To begin to fix this problem, it is helpful to consider another scenario. Many people would probably agree that having more money is preferable (if having too much worries you, then you can always give away the surplus to your favorite charities). What is interesting, however, is that being wealthy decreases the perceived value of money. This is illustrated in the next example.
Below are several possible scenarios that could be presented on the television program. Consider how you would react to each one.
Based on these examples, it seems that the client or evaluator of the decision-making system must indicate preferences between probability distributions over outcomes. There is a formal way to ensure that once these preferences are assigned, a cost function can be designed for which its expectation faithfully reflects the preferences over distributions. This results in utility theory, which involves the following steps:
The client must specify preferences among probability distributions of
outcomes. Suppose that Formulation 9.2 is used. For
convenience, assume that and
are finite. Let
denote
a state space based on outcomes.9.5 Let
denote a
mapping that assigns a state to every outcome. A simple example is to
declare that
and make
the identity map.
This makes the outcome space and state space coincide. It may be
convenient, though, to use
to collapse the space of outcomes down
to a smaller set. If two outcomes map to the same state using
,
then it means that the outcomes are indistinguishable as far as
rewards or costs are concerned.
Let denote a probability distribution over
, and let
denote
the set of all probability distributions over
. Every
is
represented as an
-dimensional vector of probabilities in which
; hence, it is considered as an element of
. This makes
it convenient to ``blend'' two probability distributions. For
example, let
be a constant, and let
and
be any two probability distributions. Using scalar multiplication, a
new probability distribution,
, is
obtained, which is a blend of
and
. Conveniently,
there is no need to normalize the result. It is assumed that
and
initially have unit magnitude. The blend has magnitude
.
The modeler of the decision process must consult the client to
represent preferences among elements of . Let
mean
that
is strictly preferred over
. Let
mean that
and
are equivalent in preference. Let
mean that either
or
.
The following example illustrates the assignment of preferences.
Consider the construction of the state space by using
. The
outcomes
and
are identical concerning any conceivable
reward. Therefore, these should map to the same state. The other two
outcomes are distinct. The state space therefore needs only three
elements and can be defined as
. Let
,
, and
. Thus, the last two states
indicate that some gold will be earned.
The set of probability distributions over
is now considered.
Each
is a three-dimensional vector. As an example,
indicates that the state will be 0 with
probability
,
with probability
, and
with
probability
. Suppose
. Which
distribution would you prefer? It seems in this case that
is
uniformly better than
because there is a greater chance of
winning gold. Thus, we declare
. The distribution
seems to be the worst imaginable. Hence, we
can safely declare
and
.
The procedure of determining the preferences can become quite tedious
for complicated problems. In the current example, is a 2D subset
of
. This subset can be partitioned into a finite set of
regions over which the client may be able to clearly indicate
preferences. One of the major criticisms of this framework is the
impracticality of determining preferences over
[831].
After the preferences are determined, is there a way to ensure that a
real-value function on exists for which the expected value exactly
reflects the preferences? If the axioms of rationality are satisfied
by the assignment of preferences, then the answer is yes. These
axioms are covered next.
Steven M LaValle 2020-08-14