Imagine assigning reward values to various outcomes of a decision-making process. In some applications numerical values may come naturally. For example, the reward might be the amount of money earned in a financial investment. In robotics applications, one could negate time to execute a task or the amount of energy consumed. For example, the reward could indicate the amount of remaining battery life after a mobile robot builds a map.
In some applications the source of rewards may be subjective. For example, what is the reward for washing dishes, in comparison to sweeping the floor? Each person would probably assign different rewards, which may even vary from day to day. It may be based on their enjoyment or misery in performing the task, the amount of time each task would take, the perceptions of others, and so on. If decision theory is used to automate the decision process for a human ``client,'' then it is best to consult carefully with the client to make sure you know their preferences. In this situation, it may be possible to sort their preferences and then assign rewards that are consistent with the ordering.
Once the rewards are assigned, consider making a decision under Formulation 9.1, which does not involve nature. Each outcome corresponds directly to an action, . If the rewards are given by , then the cost, , can be defined as for every . Satisfying the client is then a matter of choosing to minimize .
Now consider a game against nature. The decision now involves comparing probability distributions over the outcomes. The space of all probability distributions may be enormous, but this is simplified by using expectation to map each probability distribution (or density) to a real value. The concern should be whether this projection of distributions onto real numbers will fail to reflect the true preferences of the client. The following example illustrates the effect of this.
To begin to fix this problem, it is helpful to consider another scenario. Many people would probably agree that having more money is preferable (if having too much worries you, then you can always give away the surplus to your favorite charities). What is interesting, however, is that being wealthy decreases the perceived value of money. This is illustrated in the next example.
Below are several possible scenarios that could be presented on the television program. Consider how you would react to each one.
Based on these examples, it seems that the client or evaluator of the decision-making system must indicate preferences between probability distributions over outcomes. There is a formal way to ensure that once these preferences are assigned, a cost function can be designed for which its expectation faithfully reflects the preferences over distributions. This results in utility theory, which involves the following steps:
The client must specify preferences among probability distributions of outcomes. Suppose that Formulation 9.2 is used. For convenience, assume that and are finite. Let denote a state space based on outcomes.9.5 Let denote a mapping that assigns a state to every outcome. A simple example is to declare that and make the identity map. This makes the outcome space and state space coincide. It may be convenient, though, to use to collapse the space of outcomes down to a smaller set. If two outcomes map to the same state using , then it means that the outcomes are indistinguishable as far as rewards or costs are concerned.
Let denote a probability distribution over , and let denote the set of all probability distributions over . Every is represented as an -dimensional vector of probabilities in which ; hence, it is considered as an element of . This makes it convenient to ``blend'' two probability distributions. For example, let be a constant, and let and be any two probability distributions. Using scalar multiplication, a new probability distribution, , is obtained, which is a blend of and . Conveniently, there is no need to normalize the result. It is assumed that and initially have unit magnitude. The blend has magnitude .
The modeler of the decision process must consult the client to represent preferences among elements of . Let mean that is strictly preferred over . Let mean that and are equivalent in preference. Let mean that either or . The following example illustrates the assignment of preferences.
Consider the construction of the state space by using . The outcomes and are identical concerning any conceivable reward. Therefore, these should map to the same state. The other two outcomes are distinct. The state space therefore needs only three elements and can be defined as . Let , , and . Thus, the last two states indicate that some gold will be earned.
The set of probability distributions over is now considered. Each is a three-dimensional vector. As an example, indicates that the state will be 0 with probability , with probability , and with probability . Suppose . Which distribution would you prefer? It seems in this case that is uniformly better than because there is a greater chance of winning gold. Thus, we declare . The distribution seems to be the worst imaginable. Hence, we can safely declare and .
The procedure of determining the preferences can become quite tedious for complicated problems. In the current example, is a 2D subset of . This subset can be partitioned into a finite set of regions over which the client may be able to clearly indicate preferences. One of the major criticisms of this framework is the impracticality of determining preferences over [831].
After the preferences are determined, is there a way to ensure that a
real-value function on exists for which the expected value exactly
reflects the preferences? If the axioms of rationality are satisfied
by the assignment of preferences, then the answer is yes. These
axioms are covered next.
Steven M LaValle 2020-08-14