Exercises

Suppose that a single-stage two-objective decision-making problem is defined in which there are two objectives and a continuous set of actions, . The cost vector is $L = [ u^2 \;\; u - 1]$ . Determine the set of Pareto-optimal actions.
Let

$\Theta$

0

define the cost for each combination of choices by the decision maker and nature. Let nature's randomized strategy be $[1/5 \;\;\; 2/5 \;\;\; 1/10 \;\;\; 3/10]$ .
1. Use nondeterministic reasoning to find the minimax decision and worst-case cost.
2. Use probabilistic reasoning to find the best expected-case decision and expected cost.
Many reasonable decision rules are possible, other than those considered in this chapter.
1. Exercise 2(a) reflects extreme pessimism. Suppose instead that extreme optimism is used. Select the choice that optimizes the best-case cost for the matrix in Exercise 2.
2. One approach is to develop a coefficient of optimism, $\alpha \in [0,1]$ , which allows one to interpolate between the two extreme scenarios. Thus, a decision, $u \in U$ , is chosen by minimizing
  
  $\displaystyle \alpha \; \max_{\theta \in \Theta} \Big\{ L(u,\theta) \Big\} + (1 - \alpha) \; \min_{\theta \in \Theta} \Big\{ L(u,\theta) \Big\} .$ (9.94)
  
  Determine the optimal decision for this scenario under all possible choices for $\alpha \in [0,1]$ . Give your answer as a list of choices, each with a specified range of $\alpha$ .
Suppose that after making a decision, you observe the choice made by nature. How does the cost that you received compare with the best cost that could have been obtained if you chose something else, given this choice by nature? This difference in costs can be considered as regret or minimum ``Doh!''^9.11 Psychologists have argued that some people make choices based on minimizing regret. It reflects how badly you wish you had done something else after making the decision.
1. Develop an expression for the worst-case regret, and use it to make a minimax regret decision using the matrix from Exercise 2.
2. Develop an expression for the expected regret, and use it to make a minimum expected regret decision.
Using the matrix from Exercise 2, consider the set of all probability distributions for nature. Characterize the set of all distributions for which the minimax decision and the best expected decision results in the same choice. This indicates how to provide reverse justification for priors.
Consider a Bayesian decision-theory scenario with cost function . Show that the decision rule never changes if $L(u,\theta)$ is replaced by $a L(u,\theta) + b$ , for any and $b \in {\mathbb{R}}$ .
Suppose that there are two classes, $\Omega = \{\omega_1,\omega_2\}$ , with $P(\omega_1) = P(\omega_2) = \frac{1}{2}$ . The observation space, , is ${\mathbb{R}}$ . Recall from probability theory that the normal (or Gaussian) probability density function is

$\displaystyle p(y) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-(y-\mu)^2/2\sigma^2},$ (9.95)

in which $\mu$ denotes the mean and $\sigma^2$ denotes the variance. Suppose that $p(y\vert\omega_1)$ is a normal density in which $\mu = 0$ and $\sigma^2 = 1$ . Suppose that $p(y\vert\omega_2)$ is a normal density in which $\mu = 6$ and $\sigma^2 = 4$ . Find the optimal classification rule, $\gamma : Y \rightarrow \Omega$ . You are welcome to solve the problem numerically (by computer) or graphically (by careful function plotting). Carefully explain how you arrived at the answer in any case.
Let

$\Theta$

0

give the cost for each combination of choices by the decision maker and nature. Let nature's randomized strategy be $[1/4 \;\; 1/2 \;\; 1/8 \;\; 1/8]$ .
1. Use nondeterministic reasoning to find the minimax decision and worst-case cost.
2. Use probabilistic reasoning to find the best expected-case decision and expected cost.
3. Characterize the set of all probability distributions for which the minimax decision and the best expected decision results in the same choice.
In a constant-sum game, the costs for any $u \in U$ and $v \in V$ add to yield

$\displaystyle L_1(u,v) + L_2(u,v) = c$ (9.96)

for some constant that is independent of and . Show that any constant-sum game can be transformed into a zero-sum game, and that saddle point solutions can be found using techniques for the zero-sum formulation.
Formalize Example 9.7 as a zero-sum game, and compute security strategies for the players. What is the expected value of the game?
Suppose that for two zero-sum games, there exists some nonzero $c \in {\mathbb{R}}$ for which the cost matrix of one game is obtained by multiplying all entries by in the cost matrix of the other. Prove that these two games must have the same deterministic and randomized saddle points.
In the same spirit as Exercise 11, prove that two zero-sum games have the same deterministic and randomized saddle points if is added to all matrix entries.
Prove that multiple Nash equilibria of a nonzero-sum game specified by matrices and are interchangeable if as a game yields the same Nash equilibria as the game .
Analyze the game of Rock-Paper-Scissors for three players. For each player, assign a cost of for losing, 0 for a tie, and for winning. Specify the cost functions. Is it possible to avoid regret? Does it have a deterministic Nash equilibrium? Can you find a randomized Nash equilibrium?
Compute the randomized equilibrium point for the following zero-sum game:

$\displaystyle \begin{tabular}{cc} & $V$ $U$ & \begin{tabular}{\vert c\vert c\vert}\hline 0 & -1 \hline -1 & 2 \hline \end{tabular} \end{tabular} .$ (9.97)

Indicate the randomized strategies for the players and the resulting expected value of the game.

Implementations
Consider estimating the value of an unknown parameter, $\theta \in {\mathbb{R}}$ . The prior probability density is a normal,

$\displaystyle p(\theta) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-(\theta-\mu)^2/2\sigma^2},$ (9.98)

with $\mu = 0$ and $\sigma = 4$ . Suppose that a sequence, , , $\ldots$ , , of observations is made and that each $p(y_i\vert\theta)$ is a normal density with $\mu = \theta$ and $\sigma = 9$ . Suppose that represents your guess of the parameter value. The task is select to minimize the expectation of the cost, $L(u,\theta) = (u-\theta)^2$ . Suppose that the ``true'' value of $\theta$ is . Determine the , the minimal action with respect to the expected cost after observing: for every $i \in \{ 1,\ldots,k\}$ .
1. Determine for .
2. Determine for .
3. Determine for .
This experiment is not very realistic because the observations should be generated by sampling from the normal density, $p(y_i\vert\theta)$ . Repeat the exercise using values drawn with the normal density, instead of , for each .
Implement an algorithm that computes a randomized saddle point for zero-sum games. Assume that one player has no more than two actions and the other may have any finite number of actions.
Suppose that a -stage decision-making problem is defined using multiple objectives. There is a finite state space and a finite action set for each $x \in X$ . A state transition equation, $x_{k+1} = f(x_k,u_k)$ , gives the next state from a current state and input. There are cost functionals of the form

$\displaystyle L_i(u_1,\ldots,u_K) = \sum_{k=1}^K l(x_k,u_k) + l_F(x_F),$ (9.99)

in which . Assume that $l_F(x_F) = \infty$ if $x_F \in X_{goal}$ (for some goal region $X_{goal} \subset X$ ) and otherwise. Assume that there is no termination action (which simplifies the problem). Develop a value-iteration approach that finds the complete set of Pareto-optimal plans efficiently as possible. If two or more plans produce the same cost vector, then only one representative needs to be returned.

Steven M LaValle 2020-08-14