Imagine assigning reward values to various outcomes of a decision-making process. In some applications numerical values may come naturally. For example, the reward might be the amount of money earned in a financial investment. In robotics applications, one could negate time to execute a task or the amount of energy consumed. For example, the reward could indicate the amount of remaining battery life after a mobile robot builds a map.
In some applications the source of rewards may be subjective. For example, what is the reward for washing dishes, in comparison to sweeping the floor? Each person would probably assign different rewards, which may even vary from day to day. It may be based on their enjoyment or misery in performing the task, the amount of time each task would take, the perceptions of others, and so on. If decision theory is used to automate the decision process for a human ``client,'' then it is best to consult carefully with the client to make sure you know their preferences. In this situation, it may be possible to sort their preferences and then assign rewards that are consistent with the ordering.
Once the rewards are assigned, consider making a decision under
Formulation 9.1, which does not involve nature.  Each
outcome corresponds directly to an action, 
.  If the rewards
are given by 
, then the cost, 
, can be
defined as 
 for every 
.  Satisfying the client
is then a matter of choosing 
 to minimize 
.  
Now consider a game against nature. The decision now involves comparing probability distributions over the outcomes. The space of all probability distributions may be enormous, but this is simplified by using expectation to map each probability distribution (or density) to a real value. The concern should be whether this projection of distributions onto real numbers will fail to reflect the true preferences of the client. The following example illustrates the effect of this.
To begin to fix this problem, it is helpful to consider another scenario. Many people would probably agree that having more money is preferable (if having too much worries you, then you can always give away the surplus to your favorite charities). What is interesting, however, is that being wealthy decreases the perceived value of money. This is illustrated in the next example.
Below are several possible scenarios that could be presented on the television program. Consider how you would react to each one.
Based on these examples, it seems that the client or evaluator of the decision-making system must indicate preferences between probability distributions over outcomes. There is a formal way to ensure that once these preferences are assigned, a cost function can be designed for which its expectation faithfully reflects the preferences over distributions. This results in utility theory, which involves the following steps:
The client must specify preferences among probability distributions of
outcomes.  Suppose that Formulation 9.2 is used.  For
convenience, assume that 
 and 
 are finite.  Let 
 denote
a state space based on outcomes.9.5 Let 
 denote a
mapping that assigns a state to every outcome.  A simple example is to
declare that 
 and make 
 the identity map.
This makes the outcome space and state space coincide.  It may be
convenient, though, to use 
 to collapse the space of outcomes down
to a smaller set.  If two outcomes map to the same state using 
,
then it means that the outcomes are indistinguishable as far as
rewards or costs are concerned.
Let 
 denote a probability distribution over 
, and let 
 denote
the set of all probability distributions over 
.  Every 
 is
represented as an 
-dimensional vector of probabilities in which 
; hence, it is considered as an element of 
.  This makes
it convenient to ``blend'' two probability distributions.  For
example, let 
 be a constant, and let 
 and 
be any two probability distributions.  Using scalar multiplication, a
new probability distribution, 
, is
obtained, which is a blend of 
 and 
.  Conveniently,
there is no need to normalize the result.  It is assumed that 
and 
 initially have unit magnitude.  The blend has magnitude
.
The modeler of the decision process must consult the client to
represent preferences among elements of 
.  Let 
 mean
that 
 is strictly preferred over 
.  Let 
mean that 
 and 
 are equivalent in preference.  Let 
 mean that either 
 or 
.
The following example illustrates the assignment of preferences.
Consider the construction of the state space 
 by using 
.  The
outcomes 
 and 
 are identical concerning any conceivable
reward.  Therefore, these should map to the same state.  The other two
outcomes are distinct.  The state space therefore needs only three
elements and can be defined as 
.  Let 
, 
, and 
.  Thus, the last two states
indicate that some gold will be earned.
The set 
 of probability distributions over 
 is now considered.
Each 
 is a three-dimensional vector.  As an example, 
 indicates that the state will be 0 with
probability 
, 
 with probability 
, and 
 with
probability 
.  Suppose 
.  Which
distribution would you prefer?  It seems in this case that 
 is
uniformly better than 
 because there is a greater chance of
winning gold.  Thus, we declare 
.  The distribution
 seems to be the worst imaginable.  Hence, we
can safely declare 
 and 
.
The procedure of determining the preferences can become quite tedious
for complicated problems.  In the current example, 
 is a 2D subset
of 
.  This subset can be partitioned into a finite set of
regions over which the client may be able to clearly indicate
preferences.  One of the major criticisms of this framework is the
impracticality of determining preferences over 
 [831].
After the preferences are determined, is there a way to ensure that a
real-value function on 
 exists for which the expected value exactly
reflects the preferences?  If the axioms of rationality are satisfied
by the assignment of preferences, then the answer is yes.  These
axioms are covered next.  
 
Steven M LaValle 2020-08-14