9.4.1.1 Dealing with multiple Nash equilibria

Example 9.17 was somewhat disheartening due to the existence of multiple Nash equilibria. In general, there could be any number of equilibria. How can each player know which one to play? If they each choose a different one, they are not guaranteed to fall into another equilibrium as in the case of saddle points of zero-sum games. Many of the equilibria can be eliminated by using Pareto optimality, which was explained in Section 9.1.1 and also appeared in Section 7.7.2 as a way to optimally coordinate multiple robots. The idea is to formulate the selection as a multi-objective optimization problem, which fits into Formulation 9.2.

Consider two-dimensional vectors of the form $ (x_i,y_i)$, in which $ x$ and $ y$ represent the costs $ L_1$ and $ L_2$ obtained under the implementation of a Nash equilibrium denoted by $ \pi_i$. For two different equilibria $ \pi_1$ and $ \pi_2$, the cost vectors $ (x_1,y_1)$ and $ (x_2,y_2)$ are obtained. In Example 9.17, these were $ (1,2)$ and $ (-1,0)$. In general, $ \pi_1$ is said to be better than $ \pi_2$ if $ x_1 \leq x_2$, $ y_1 \leq y_2$, and at least one of the inequalities is strict. In Example 9.17, the equilibrium that produces $ (-1,0)$ is clearly better than obtaining $ (1,2)$ because both players benefit.

The definition of ``better'' induces a partial ordering on the space of Nash equilibria. It is only partial because some vectors are incomparable. Consider, for example, $ (-1,1)$ and $ (1,-1)$. The first one is preferable to $ {{\rm P}_1}$, and the second is preferred by $ {{\rm P}_2}$. In game theory, the Nash equilibria that are minimal with respect to this partial ordering are called admissible. They could alternatively be called Pareto optimal.

The best situation is when a game has one Nash equilibrium. If there are multiple Nash equilibria, then there is some hope that only one of them is admissible. In this case, it is hoped that the rational players are intelligent enough to figure out that any nonadmissible equilibria should be discarded. Unfortunately, there are many games that have multiple admissible Nash equilibria. In this case, analysis of the game indicates that the players must communicate or collaborate in some way to eliminate the possibility of regret. Otherwise, regret is unavoidable in the worst case. It is also possible that there are no Nash equilibria, but, fortunately, by allowing randomized strategies, a randomized Nash equilibrium is always guaranteed to exist. This will be covered after the following two examples.

Example 9..18 (The Battle of the Sexes)   Consider a game specified by the cost matrices $ A$ and $ B$:

$\displaystyle A: \hspace*{5mm} \begin{tabular}{cc} & $V$\ \\ $U$\ & \begin{tabu...
...t c\vert}\hline -1 & 0 \\ \hline 0 & -2 \\ \hline \end{tabular} \end{tabular} .$ (9.69)

This is a famous game called the ``Battle of the Sexes.'' Suppose that a man and a woman have a relationship, and they each have different preferences on how to spend the evening. The man prefers to go shopping, and the woman prefers to watch a football match. The game involves selecting one of these two activities. The best case for either one is to do what they prefer while still remaining together. The worst case is to select different activities, which separates the couple. This game is somewhat unrealistic because in most situations some cooperation between them is expected.

Both $ u = v = 1$ and $ u = v = 2$ are Nash equilibria, which yield cost vectors $ (-2,-1)$ and $ (-1,-2)$, respectively. Neither solution is better than the other; therefore, they are both admissible. There is no way to avoid the possibility of regret unless the players cooperate (you probably already knew this). $ \blacksquare$

The following is one of the most famous nonzero-sum games.

Example 9..19 (The Prisoner's Dilemma)   The following game is very simple to express, yet it illustrates many interesting issues. Imagine that a heinous crime has been committed by two people. The authorities know they are guilty, but they do not have enough evidence to convict them. Therefore, they develop a plan to try to trick the suspects. Each suspect (or player) is placed in an isolated prison cell and given two choices. Each player can cooperate with the authorities, $ u=1$ or $ v = 1$, or refuse, $ u=2$ or $ v = 2$. By cooperating, the player admits guilt and turns over evidence to the authorities. By refusing, the player claims innocence and refuses to help the authorities.

The cost $ L_i$ represents the number of years that the player will be sentenced to prison. The cost matrices are assigned as

$\displaystyle A: \hspace*{5mm} \begin{tabular}{cc} & $V$\ \\ $U$\ & \begin{tabu...
...rt c\vert}\hline 8 & 30 \\ \hline 0 & 2 \\ \hline \end{tabular} \end{tabular} .$ (9.70)

The motivation is that both players receive $ 8$ years if they both cooperate, which is the sentence for being convicted of the crime and being rewarded for cooperating with the authorities. If they both refuse, then they receive $ 2$ years because the authorities have insufficient evidence for a longer term. The interesting cases occur if one refuses and the other cooperates. The one who refuses is in big trouble because the evidence provided by the other will be used against him. The one who cooperates gets to go free (the cost is 0); however, the other is convicted on the evidence and spends $ 30$ years in prison.

What should the players do? What would you do? If they could make a coordinated decision, then it seems that a good choice would be for both to refuse, which results in costs $ (2,2)$. In this case, however, there would be regret because each player would think that he had a chance to go free (receiving cost 0 by refusing). If they were to play the game a second time, they might be inclined to change their decisions.

The Nash equilibrium for this problem is for both of them to cooperate, which results in $ (8,8)$. Thus, they pay a price for not being able to communicate and coordinate their strategy. This solution is also a security strategy for the players, because it achieves the lowest cost using worst-case analysis. $ \blacksquare$

Steven M LaValle 2020-08-14