9.3.3.2 Computation of randomized saddle points

So far it has been established that a randomized saddle point always exists, but how can one be found? Two key observations enable a combinatorial solution to the problem:

1. The security strategy for each player can be found by considering only deterministic strategies for the opposing player.
2. If the strategy for the other player is fixed, then the expected cost is a linear function of the undetermined probabilities.

First consider the problem of determining the security strategy for . The first observation means that (9.59) does not need to consider randomized strategies for . Inside of the , is fixed. What randomized strategy, , maximizes ? If is fixed, then can be treated as a constant -dimensional vector, . This means , in which is the inner (dot) product. Now the task is to select to maximize . This involves selecting the largest element of ; suppose this is . The maximum cost over all is obtained by placing all of the probability mass at action . Thus, the strategy and for gives the highest cost, and it is deterministic.

Using the first observation, for each , only possible responses by need to be considered. These are the deterministic strategies, each of which assigns for a unique .

Now consider the second observation. The expected cost, , is a linear function of , if is fixed. Since only needs to be fixed at different values due to the first observation, is selected at the point at which the smallest maximum value among the linear functions occurs. This is the minimum value of the upper envelope of the collection of linear functions. Such envelopes were mentioned in Section 6.5.2. Example 9.16 will illustrate this. The domain for this optimization can conveniently be set as a triangle in . Even though , the last coordinate, , is not needed because it is always . The resulting optimization falls under linear programming, for which many combinatorial algorithms exist [259,264,664,731].

In the explanation above, there is nothing particular to when trying to find its security strategy. The same method can be applied to determine the security strategy for ; however, every minimization is replaced by a maximization, and vice versa. In summary, the in (9.60) needs only to consider the deterministic strategies in . If becomes fixed, then is once again a linear function, but this time it is linear in . The best randomized action is chosen by finding the point that gives the highest minimum value among linear functions. This is the minimum value of the lower envelope of the collection of linear functions. The optimization occurs over because the last coordinate, , is obtained directly from .

This computation method is best understood through an example.

Example 9..16 (Computing a Randomized Saddle Point)   The simplest case is when both players have only two actions. Let the cost matrix be defined as

 (9.63)

Consider computing the security strategy for . Note that and are only one-dimensional subsets of . A randomized strategy for is , with , , and . Therefore, the domain over which the optimization is performed is because can always be derived as . Using the first observation above, only the two deterministic strategies for need to be considered. When considered as linear functions of , these are

 (9.64)

for and

 (9.65)

for . The lines are plotted in Figure 9.4a. The security strategy is determined by the minimum point along the upper envelope shown in the figure. This is indicated by the thickened line, and it is always a piecewise-linear function in general. The lowest point occurs at , and the resulting value is . Therefore, .

A similar procedure can be used to obtain . The lines that correspond to the deterministic strategies of are shown in Figure 9.4b. The security strategy is obtained by finding the maximum value along the lower envelope of the lines, which is shown as the thickened line in the figure. This results in , and once again, the value is observed as (this must coincide with the previous one because the randomized upper and lower values are the same!).

This procedure appears quite simple if there are only two actions per player. If , then the upper and lower envelopes are piecewise-linear functions in . This may be computationally impractical because all existing linear programming algorithms have running time at least exponential in dimension [264].

Steven M LaValle 2020-08-14