So far it has been established that a randomized saddle point always exists, but how can one be found? Two key observations enable a combinatorial solution to the problem:
First consider the problem of determining the security strategy for . The first observation means that (9.59) does not need to consider randomized strategies for . Inside of the , is fixed. What randomized strategy, , maximizes ? If is fixed, then can be treated as a constant -dimensional vector, . This means , in which is the inner (dot) product. Now the task is to select to maximize . This involves selecting the largest element of ; suppose this is . The maximum cost over all is obtained by placing all of the probability mass at action . Thus, the strategy and for gives the highest cost, and it is deterministic.
Using the first observation, for each , only possible responses by need to be considered. These are the deterministic strategies, each of which assigns for a unique .
Now consider the second observation. The expected cost, , is a linear function of , if is fixed. Since only needs to be fixed at different values due to the first observation, is selected at the point at which the smallest maximum value among the linear functions occurs. This is the minimum value of the upper envelope of the collection of linear functions. Such envelopes were mentioned in Section 6.5.2. Example 9.16 will illustrate this. The domain for this optimization can conveniently be set as a triangle in . Even though , the last coordinate, , is not needed because it is always . The resulting optimization falls under linear programming, for which many combinatorial algorithms exist [259,264,664,731].
In the explanation above, there is nothing particular to when trying to find its security strategy. The same method can be applied to determine the security strategy for ; however, every minimization is replaced by a maximization, and vice versa. In summary, the in (9.60) needs only to consider the deterministic strategies in . If becomes fixed, then is once again a linear function, but this time it is linear in . The best randomized action is chosen by finding the point that gives the highest minimum value among linear functions. This is the minimum value of the lower envelope of the collection of linear functions. The optimization occurs over because the last coordinate, , is obtained directly from .
This computation method is best understood through an example.
Consider computing the security strategy for . Note that and are only one-dimensional subsets of . A randomized strategy for is , with , , and . Therefore, the domain over which the optimization is performed is because can always be derived as . Using the first observation above, only the two deterministic strategies for need to be considered. When considered as linear functions of , these are
(9.64) |
(9.65) |
A similar procedure can be used to obtain . The lines that
correspond to the deterministic strategies of
are shown in
Figure 9.4b. The security strategy is obtained by
finding the maximum value along the lower envelope of the lines, which
is shown as the thickened line in the figure. This results in
, and once again, the value is observed as
(this must coincide with the previous one because the randomized
upper and lower values are the same!).
This procedure appears quite simple if there are only two actions per player. If , then the upper and lower envelopes are piecewise-linear functions in . This may be computationally impractical because all existing linear programming algorithms have running time at least exponential in dimension [264].
Steven M LaValle 2020-08-14