Sage Journals: Discover world-class research

Abstract

How are the advantage relations between a set of agents playing a game organized and how do they reflect the structure of the game? In this paper, we illustrate ‘Principal Trade-off Analysis’ (PTA), a decomposition method that embeds games into a low-dimensional feature space. We argue that the embeddings are more revealing than previously demonstrated by developing an analogy to Principal Component Analysis (PCA). PTA represents an arbitrary two-player zero-sum game as linear combination of simple games via the projection of policy profiles into orthogonal 2D feature planes. We show that the feature planes represent unique strategic trade-offs and truncation of the sequence provides insightful model reduction and visualization. We demonstrate the validity of PTA on a quartet of games (Kuhn poker, RPS + 2, Blotto and Pokemon). In Kuhn poker, PTA clearly identifies the trade-off between bluffing and calling. In Blotto, PTA identifies game symmetries and specifies strategic trade-offs associated with distinct win conditions. These symmetries reveal limitations of PTA unaddressed in previous work. For Pokemon, PTA recovers clusters that naturally correspond to Pokemon types, correctly identifies the designed trade-off between those types, and discovers a rock-paper-scissor (RPS) cycle in the Pokemon generation type – all absent any specific information except game outcomes.

Keywords

Game theory principal component analysis Schur decomposition disc game data visualization strategic feature extraction

Introduction

In recent years algorithms have achieved superhuman performance in a number of complex games such as Chess, Go, Shogi, Poker and Starcraft.^1–4 Despite impressive game play, enhanced understanding of the game is typically only achieved by additional analysis of the algorithm’s play post facto.⁵ Current work emphasizes the ‘policy problem’, developing strong agents, despite growing demand for a task theory which addresses the ‘problem problem’, that is, what games are worth study and play.^6,7 A task theory requires a language that characterizes and categorizes games, namely, a toolset of measures and visualization techniques that evaluate and illustrate game structure. Summary visuals and measures are especially important for complex games where direct analysis is intractable. In this vein, tournaments are used to sample the game and to empirically evaluate agents. The empirical analysis of tournaments has a long history, in sports analytics,^8,9 ecology and animal behaviour^10,11 and biology.^12,13 While the primary interest in these cases is typically in ranking agents/players, tournaments also reveal significant information about the nature of the underlying game.¹⁴ This paper describes mathematical techniques for extracting useful information about the underlying game structure directly from tournament data. While these methods can be applied to the various contexts in which tournaments are already employed in machine learning (e.g. population based training), they open up a range of new research questions regarding the characterization of natural games, synthesis of artificial games (cf. Omidshafiei et al.⁶), game approximation via simplified dynamics, and the strategic perturbation of games.

Note that our objectives are as empirical as they are game theoretic. Empirical game theory, the study of games from actual game play data (e.g. sports analytics), studies games as they are played by a particular population, rather than by an idealized player. Thus, empirical game theory has its own, valid, objectives beyond finding equilibria or optimal players. Exclusive focus on optima ignores the global structure of a game as it is experienced by the majority of participants. What decision dilemmas do they face? What game dynamics do they experience? What game space must they navigate in the process of optimization? How should they exploit a chosen opponent, population, or form teams? All of these questions are more easily addressed given a simplified global representation that isolates each important independent aspect of a game. PTA offers such a summary.

PTA represents structural characteristics of a tournament by a low dimensional embeddings that maps competitive relationships to embedded geometry. We review and expand on methods introduced by Balduzzi et al.,¹⁵ who proposed a series of maps that describe a sample tournament in terms of a sum of simple games, namely, disc games.

Our contribution follows. First, we compare PCA¹⁶ to disc game embedding, and show that disc game embeddings inherit the key algebraic properties responsible for the success of PCA. Based on this analogy, we propose PTA as a general technique for visualizing data arising from competitive tasks or pairwise choice tasks. Indeed, while we focus on games for their charisma, any data set representing a skew-symmetric comparison of objects is amenable to PTA. Via a series of examples, we show that PTA provides a much richer framework for analysing trade-offs in games than previously demonstrated. Our examples exhibit a wide variety of strategic trade-offs that can be clearly visualized with PTA. Unlike existing work, we focus on the relation between embedding coordinates, which represent performance relations, and underlying agent attributes, to elucidate the principal trade-offs responsible for cyclic competition. Moreover, we consider the full information content of PTA by analysing multiple leading disc games and the decay in their importance. Important strategic trade-offs can arise in later disc games, so previous empirical work’s focus on the leading disc game is myopic. These examples also raise conceptual limitations not addressed in previous work, thus outline future directions for development.

Related work

Our work builds directly on Balduzzi et al.,¹⁵ which used the embedding approach to introduce a comprehensive agent evaluation scheme. Their scheme uses the real Schur form (PTA) in conjunction with the Hodge decomposition to overcome deficiencies in standard ranking models.¹⁷ Our work also compliments efforts to explore cyclic structures in competitive systems,^18,19 economics,^20,21 and tangentially as multi-class classification problems.^22,23 Cycles challenge traditional gradient methods and can slow training.^6,24 Accordingly, highly cyclic games, such as iterated rock-paper-scissors, are useful as benchmark tasks for mulit-agent reinforcement learning.²⁵ Moreover, cyclic structures in games are often intricate and difficult to disentangle, particularly among intermediate competitors. Games of skill frequently exhibit this ‘spinning top geometry’.²⁶ By summarizing cyclic structures, PTA helps identify areas of the strategy space that cause difficulty during training, or should be targeted for diverse team design.^27,28 Here, we show that PTA can identify fundamental trade-offs that summarize otherwise opaque cyclic structure. Trade-offs play an important role in decision tasks and evolutionary processes outside of games, so general tools that isolate and reify trade-offs are of generic utility.^6,14 In that sense, our attempt to visualize game structure is in line with generic data visualization efforts, which aim to convert complicated data into elucidating graphics (cf. Garnelo et al.²⁸ and Healy²⁹).

Background

Functional form games

A two-player zero-sum functional form game, is defined by an attribute space $Ω \subseteq R^{T}$ and an evaluation function $f$ that returns the advantage of one agent over another given their attributes. Agents in the game can be represented by their attribute vectors $x, y \in Ω$ , whose entries could represent agent traits, strategic policies, weights in a neural net governing their actions, or more generally, any attributes that influence competitive behaviour. The function $f$ is of the form $f : Ω \times Ω \to R$ . The value $f (x, y)$ , quantifies the advantage of agent $x$ over $y$ with a real number. We assume that advantage is zero-sum. Consequently, $f$ must be skew-symmetric, $f (x, y) = - f (y, x)$ .¹⁹ If $f (x, y) > 0$ we say that $x$ beats $y$ and the outcome is a tie if $f (x, y) = 0$ . The larger $| f (x, y) |$ , the larger the advantage one competitor possesses over another. We do not specify how advantage is measured, since the appropriate definition may depend on the setting. Possible examples include expected return in a zero-sum game, probability of win minus a half, or log odds of victory. With a set of agents $X$ , pairwise comparisons of all agents gives a $N \times N$ evaluation matrix $F$ where $F_{ij} = f (x_{i}, x_{j})$ . Any such matrix can be separated into transitive and cyclic components, $F_{t}$ and $F_{c}$ , via the Helmholtz-Hodge decomposition (HHD).^15,19,30 The HHD writes $F = F_{t} + F_{c}$ where $[F_{t}]_{ij} = r_{i} - r_{j}$ , and where $r$ are least squares ratings that evaluate the average performance of each agent. These matrices can be analysed to study the structure of the game among those competitors, that is, the resulting tournament.

Disc games

The cyclic component of a tournament can be visualized using a combination of simple cyclic games.^15,27 The simplest cyclic functional form game is a disc game, which acts as a continuous analogue to rock-paper-scissors (RPS) in two-dimensional attribute spaces. The disc game evaluation function is the cross product between competitor’s embedded attributes,

disc (x, y) = x_{1} y_{2} - x_{2} y_{1} = x^{T} [\begin{matrix} 0 & 1 \\ - 1 & 0 \end{matrix}] y = x^{T} Ry

(1)

where $R$ is the 2 × 2 90° rotation matrix.¹⁵ The cross product models a basic trade-off between the two attributes.

Any evaluation matrix can be represented with a sum of pointwise embeddings into a sequence of disc games. The necessary construction follows.

Principal trade-off analysis (PTA)

PTA decomposes an arbitrary performance matrix $F$ into a sum of simpler performance matrices by embedding each agent into a series of disc games that model important strategic trade-offs.

Any real, $m \times m$ , skew-symmetric matrix $A$ admits a Schur decomposition (real Schur form), $QU Q^{T}$ . Here $Q$ is an orthonormal $m \times rank (A)$ matrix, $U$ is block diagonal with $rank (A) / 2$ , $2 \times 2$ blocks of the form $U^{(k)} = ω_{k} R$ , where $ω_{k} \geq 0$ is a nonnegative real number. Each pair of consecutive columns of $Q$ , $[q_{2 k - 1}, q_{2 k}]$ , correspond to the real and imaginary parts of an eigenvector of $A$ scaled by $\sqrt{2}$ . The scalars $ω$ are the nonnegative imaginary part of the corresponding eigenvalues, listed in decreasing order.^31,32 A linear algebra exercise demonstrates that the columns of $Q$ are also proportional to the singular vectors of $A$ , and the sequence of scalars $ω$ match the singular values of $A$ . A truncated expansion of $A$ using only the first $r$ blocks is equivalent to truncating the singular value decomposition at $2 r$ singular vectors, so, by the Eckart-Young-Mirsky theorem, equals the optimal rank $2 r$ approximation to $A$ under the Frobenius norm.^33–35

When $A$ is replaced with the performance matrix $F$ , each block in the Schur decomposition acts as a scaled version of a disc game where each competitor is assigned embedding coordinates via $Q$ . The performance matrix $F$ is skew-symmetric, so admits a Schur decomposition:

F = QU Q^{T} .

(2)

As in PCA, we consider a low-rank approximation of $F$ associated with expansion onto the first $r$ disc games, where $r$ is chosen large enough to satisfy a desired reconstruction accuracy. Low-rank approximation allows model reduction and mixed equilibria approximation in quasi-polynomial time.³⁶ The closest rank $2 r$ approximation to $F$ in Frobenius norm is given by replacing $Q$ with $Q^{(1 : 2 r)}$ , and $U$ with $U^{(1 : 2 r)}$ in equation (2), where $Q^{(1 : 2 r)}$ is the first $2 r$ columns of $Q$ , and $U^{(1 : 2 r)}$ is the upper $2 r$ by $2 r$ minor of $U$ . As in other latent variable models, the feasibility of low-rank approximation can be justified when $f$ is sufficiently smooth via the Johnson-Lindenstrauss lemma.³⁷ The matrix $Q^{(1 : 2 r)}$ provides a set of basis vectors. Projection onto those basis vectors defines a new set of coordinates, thereby embedding the competitors. Specifically, let:

\hat{Y} = Q^{{(1 : 2 r)}^{T}} F = U^{(1 : 2 r)} Q^{{(1 : 2 r)}^{T}} .

(3)

and scale each pair of embedding coordinates by the associated eigenvalue so ${\vec{y}}_{k} (i) = [y_{i, 2 k - 1}, y_{i, 2 k}] = ω_{k}^{- 1 / 2} [{\hat{y}}_{i, 2 k - 1}, {\hat{y}}_{i, 2 k}] = {ω_{k}}^{1 / 2} [q_{i, 2 k - 1}, q_{i, 2 k}]$ . Then ${\vec{y}}_{k} (i)$ maps from competitor indices, $i$ , to points in $R^{2}$ , and the set $Y = {{\vec{y}}_{k}}$ is a collection of planar embeddings, where ${\vec{y}}_{k}$ is given by projection onto a feature plane spanned by $q_{2 k - 1}$ and $q_{2 k}$ .

A user interested in the transitive and cyclic components of $F$ separately, could begin by breaking $F$ into $F_{t}$ and $F_{c}$ .¹⁹ The transitive component can be represented on a line via the ratings, so does not require additional visualization.¹⁵ The cyclic component $F_{c}$ is skew-symmetric, so can be represented via PTA. Then performance is represented by a combination of two components. The first compares the overall quality of the agents, as quantified by a set of ratings. The second represents all remaining cyclic relations as a combination of principal trade-offs. Since PTA depends only on $F$ , the cost of performing PTA is independent of the complexity of the underlying game or agents. Once $F$ is formed, PTA proceeds at the same cost as standard low-rank matrix decompositions like PCA which seek the leading singular vectors of a symmetric matrix ( $O (n^{3})$ for $n$ sampled agents³⁸). While iterative algorithms may be more efficient when only a few leading columns of $Q$ are required, the computational cost of performing PTA in an empirical setting will almost always be swamped by the cost of gathering the data for forming $F$ , which requires training and comparing $O (n^{2})$ pairs of agents. The simulation cost could be reduced if low-rank completion methods were applied to fill in missing data.^39–41 We leave in-depth sampling considerations to future work and report results using Chen et al.⁴² for matrix completion.

Note that the Schur decomposition is only unique up to rotation within each feature plane, since complex conjugate pairs of eigenvectors of $F$ are only uniquely defined up to their complex phase. Thus, two embeddings are equivalent if they agree up to rotation within each planar embedding.

The evaluation $F_{ij}^{(2 k)}$ between agent $i$ and agent $j$ equals a sum over each embedding ${\vec{y}}_{k}$ , of the cross product ${\vec{y}}_{k} (i) \times {\vec{y}}_{k} (j)$ (see Appendix). That is:

F_{ij}^{(2 r)} = \sum_{k = 1}^{r} {\vec{y}}_{k} (i) \times {\vec{y}}_{k} (j) = \sum_{k = 1}^{r} disc ({\vec{y}}_{k} (i), {\vec{y}}_{k} (j)) .

(4)

Thus, restricted to each embedding $F^{(2 r)}$ is a disc game and the optimal rank $2 r$ approximation of $F^{(2 r)}$ is a linear combination of disc games applied to the sequence of planar embeddings ${{\vec{y}}_{k}}_{k = 1}^{r}$ .

This decomposition is useful for two reasons. First, it depends on a spectral decomposition of $F$ , so inherits the key algebraic properties that account for the success of PCA. An equivalent construction is introduced in Chen and Joachims⁴¹ where it is called the ‘blade-chest-inner’ model. The construction in Chen and Joachims⁴¹ is not based on a spectral decomposition, so lacks orthogonality or low-rank optimality. In PTA, the embeddings are projections onto orthogonal planes, so each embedding encodes independent information about cyclic competition. Specifically, the embedding coordinates are uncorrelated when an agent is sampled randomly from the population. The planes act like feature vectors as they are typically associated with some strategic trade-off (see Section 5). Therefore, as PCA identifies principal components, PTA identifies principal trade-offs: orthogonal planes associated with a sequence of fundamental cyclic modes. The two decompositions differ since PCA uses the singular value decomposition, while PTA uses the Schur real form. Nevertheless, the sequence of embeddings form optimal low-rank approximations to $F$ , where the importance of each embedding is quantified by the magnitude of the associated eigenvalue. Thus, the sequence of eigenvalues determines the number of disc game embeddings, $r$ , required to achieve a sufficiently accurate approximation of $F$ . The number of disc games is half the numerical rank of $F$ and is a natural measure of the complexity of cyclic competition. The complexity is distinct from the overall intensity of cyclic competition or the game balance, which depend on $∥ F_{c} ∥$ and $∥ F ∥$ instead of its rank.¹⁹ Instead, it counts the number of distinct cyclic modes in the evaluation matrix. PTA is most effective when $r$ is small. The smaller $r$ , the simpler the associated representation. In our experiments, the numerical rank is typically small and grows slowly in the size of the underlying game or strategy space.

Second, disc games encode performance in geometry. Given coordinates ${\vec{y}}_{k} (i)$ and ${\vec{y}}_{k} (j)$ , the advantage competitor $i$ possesses over $j$ is given by ${\vec{y}}_{k} (i) \times {\vec{y}}_{k} (j)$ . In polar coordinates, each agent is assigned a radius and angle per disc game, $(r_{k} (i), θ_{k} (i))$ . Then, $disc (\vec{y} (i), \vec{y} (j)) = r_{k} (i) r_{k} (j) \sin (θ_{k} (j) - θ_{k} (i))$ . The larger $r_{k} (i)$ , the more the $k^{th}$ cyclic mode influences the performance of agent $i$ . For a fixed radius, one competitor gains the most advantage when it is embedded $90^{°}$ clockwise from its opponent and possesses an advantage as long as it is embedded clockwise of its opponent. Thus, advantage flows clockwise about the origin. We visualize this flow with a circulating vector field, $\vec{v} (\vec{y}) = [y_{2}, - y_{1}]$ . This intuitive geometry allows disc games to encode a variety of cyclic structures visually.

Experiments

Here, we illustrate the graphical power of PTA via Kuhn poker, RPS + 2, Blotto, and Pokemon. All four exhibit interesting cyclic structures that are explained well by PTA. Kuhn is chosen to clearly illustrate the interpretation of each disc game as a trade-off, RPS + 2 to show that the disc games can encode more complex trade-offs through the embedded geometry, Blotto to illustrate potential limitations of PTA arising from symmetries as well as the elegance of the trade-off representation when decomposing interlocked cyclic structures, and Pokemon to illustrate the diversity of possible trade-offs that can be recovered via PTA. In each case, we emphasize the interpretation of the principal trade-off to show that PTA reveals diverse, fine-grained game structures based only on empirical game data.

Kuhn Poker

Kuhn poker is a simplified, two-player poker game played with an $n$ card deck (traditionally $n = 3$ ), single card hands, and a single betting round.⁴³ It is also a zero-sum, incomplete information game. Strategic trade-offs are produced by the lack of complete information. Each player’s choices confer information about their hand to their opponent, however, that information may or may not be trustworthy. For example, a bet may or may not indicate a strong hand.

The game tree for Kuhn poker is illustrated in Figure 1. Given a particular pair of hands, the game tree includes four decision points. After a call, both hands are revealed, and the player with the higher card wins. Using an $n$ card deck, the extensive form game has $9 n (n - 1)$ states and $2 n$ information sets for each player. All choices are binary, and a particular agent could play as P1 or P2, so a complete policy consists of $4 n$ decision probabilities.

Figure 1.

Left: The full 3 card Kuhn game tree. The height of each coloured bar in a card indicates the probability a player with that card takes the more aggressive action. The frequencies shown are a Nash equilibrium for the Bet/Call actions. For example, at the first node, a player holding a queen should always check. Black circles are leaf nodes. Right: The tradeoffs in each disc game come from a subset of the game tree. Arrows point in the direction of advantage. Strategies that lead to tradeoffs are given labels below each reduced tree.

Given two agents, we measure the advantage one possesses over the other as their expected winnings if they are equally likely to bet first or second. Since no information set ever appears twice along the same path through the game tree, and no player ever makes more than two decisions per game, the payout function is a cubic, multilinear, skew-symmetric polynomial in the policies.

The Nash Equilibrium policies (NE) for the three-card game are shown in Figure 1. Note that, at the NE, some decision probabilities are set to zero or one. These correspond to decisions for which only one action is rational. For example, it is never rational to fold when holding the strongest card or to call when holding the weakest. These decision probabilities lie at the boundary of the simplex of allowed policies and are subject to strong selection. We call the boundary containing the NE the rational boundary.

Restricted to the rational boundary, the performance function $f$ is a bilinear function in the unconstrained decision probabilities. The game does not admit a unique NE, parameterized by the probability, $α$ , that the first player bets when holding a Jack (the weakest card).⁴³ Restricted to the boundary, the game is highly cyclic, since the unconstrained probabilities induce trade-offs associated with the information conveyed, and inferred, when betting or checking. These dynamics are entirely neutral when averaged over the rational boundary, thus restricted to the boundary, the game is favourite-free.¹⁹ That is, all policies on the rational boundary have zero expected payout when opponents are drawn from an isovariant distribution on the unconstrained policies.

Nevertheless, policies lying off the NE on the rational boundary may be exploited by other strategies on the rational boundary. For example, when trained under optimal self-play, policies on the rational boundary circulate about the NE. These dynamics are directed by the self-play gradient vector field, $v (z) = \nabla_{x} f (x, y) |_{x = y = z}$ . Note that a policy $z_{*}$ is only a NE if $v (z_{*}) = 0$ . The circulation in $v (z)$ is driven by the aforementioned decision trade-offs. A dishonest player who tends to bluff (bet on a weak card) can be countered by a distrusting opponent who assumes a bluff. Assuming a bluff has zero net payout since it is countered by honest opponents but counters dishonest opponents.

These considerations recommend Kuhn poker for PTA. We aim to recover, and rank by importance, the trade-offs that arise on the rational boundary using PTA. Unlike extant work, we target visualization and strategic feature extraction.

To generate agents we used the Neural fictitious self-play (NFSP) algorithm² implemented using the Open Spiel software package.⁴⁴ We used a simple feed-forward neural network with a single hidden layer of size 64 for both the policy and critic networks. We ran the algorithm until convergence, minimizing exploitability – the best your opponent could do if they knew your policy. Near the NE, the exploitability will tend towards 0. To extract a policy representation we compute the decision probabilities from the converged policy network.

Given the converged agent, we identify particular policies that lie on or very near to the boundary of the allowed policy space. If the corresponding component of the self-play gradient points outward against the boundary, then we treat that policy as constrained to the boundary. Then, to generate a relevant population, we sample an ensemble of random agents near the converged policy, constrained to the rational boundary. Next, we evaluate performance between all pairs of agents in the population and apply PTA to embed them.

To extract the underlying trade-offs, we seek an interpretable mapping between policies and embedding coordinates. Since 3 card Kuhn poker is bilinear once constrained to the rational boundary, the map from policy to embedding coordinates is linear. For larger deck sizes, the performance function can be approximated by a quadratic polynomial provided the sampled agents are sufficiently close to the converged agent. Then, as in the 3 card case, the realized embeddings are near to linear functions of the agent policies.⁴⁵

We recover that map by fitting the embedded agent coordinates to a linear function of their policies. The linear map into each disc game is a scaled projection onto a hyperplane in policy space. We identify the trade-offs by isolating the sparsest possible basis vectors for each hyperplane.

We repeated this numerical experiment for decks of sizes 3–13.

Even though the game tree grows quadratically in $n$ , and the dimension of the policy space grows linearly in $n$ , the number of relevant disc games (numerical rank of the performance matrix) remained close to constant across all deck sizes tested. To measure the numerical rank we computed the relative error in the low-rank approximation to $F$ for varying $r$ . In all cases, the first two disc games accounted for at least 71% of the structure of $F$ , and the first four accounted for 89% of the structure of $F$ . The contributions of each disc game are reported in Table 1. Note that each disc game retains roughly the same importance, no matter the deck size. This observation suggests that the disc games recover strategic trade-offs that generalize over different deck sizes.

Table 1.

Disc Game Importance in Kuhn Poker

Deck size	D.G. 1 (%)	D.G. 2 (%)	D.G. 3 (%)	D.G. 4 (%)	Total (%)	Error (%)	$\frac{(Num . rank)}{2}$	Game States	Info states
3	49	30	12	8	99	1	3	54	12
4	50	28	12	6	96	4	3	108	16
5	50	23	13	8	94	6	4	180	20
6	48	26	11	7	93	7	4	270	24
7	49	25	11	7	92	8	4	378	28
13	46	25	10	8	89	11	5	1404	52

Importance of each disc game, measured as $ω_{k}^{2} / (\sum_{k} ω_{k}^{2})$ , for the first four disc games given populations concentrated near the NE in Kuhn poker with varying deck sizes. The first column shows the deck size. The next six show the importance of the first four disc games, their total, and the associated reconstruction error in the rank $8$ approximation to $F$ . The final three compare the number of disc games required to satisfy a 90% accuracy threshold with the number of states in the game tree, and the dimension of the policy space.

Figure 2 illustrates the trade-offs produced by the first two disc games when $n = 3$ . These represent two different four-way rock-paper-scissor trade-offs. Figure 2 shows extremal policies that characterize the trade-offs. Note that these trade-offs arise directly from the incomplete information nature of the game. In each case, a player possesses an advantage over their opponent if they can correctly infer the opponent’s hand based on their action, or can trick their opponent. The eight possible combinations of player role, honesty in action, and assumed honesty of opponent produce eight stereotypical policies, arranged in two disc games. The two disc games account for 49% and 30% of the observed performance relations. The first trade-off discovered in the 3-card game generalizes to larger card games (see the rightmost panels of Figure 2). Thus PTA successfully extracts interpretable, generalizable strategic trade-offs from sample tournament data.

Figure 2.

Left: The first two disc games with 4 marked axes representing the associated strategic trade-off about the NE. Middle: The weighting of the linear fit from policy to embedding is shown for each axis of the disc game. The $x$ -axis labels shown at the bottom indicate the order of the weights w.r.t the policy. Right: An analysis of $N$ -Deck Kuhn is visualized. The first table shows the Nash frequencies for various deck sizes. The next two tables show trade-off directions, as in the middle column for deck sizes ranging from 3 to 13. The colour of each cell represents a decision probability (NE table), or adjustment to a NE probability. The consistency of the colour patterns across deck size shows that the trade-off directions generalize over deck size.

In practical settings, many of the entries of $F$ are unknown and must be imputed. Following Bertrand et al.,¹⁷ we complete $F$ via low-rank matrix completion.⁴¹ The completed embeddings produced by PTA only require a small fraction of the entries of $F$ when the agents are sampled uniformly. To test the stability of the completed embeddings we ran a drop-out experiment, dropping 70%–99% of all entries randomly before completing the matrices. For deck sizes ranging from 3 to 13 cards, only 30% of the entries are needed to recover the first three embeddings to ≤5% relative error. The stability of the embeddings depends on the separation between their associated eigenvalue, and the next closest eigenvalue. As a result, the most important embedding is very stable, and could be recovered to ≤3% error when 98% of the entries of $F$ are dropped. For deck sizes less than 10, the second embedding could be recovered to ≤5% error when 90% of the entries of $F$ are dropped.

Rock paper scissors + 2

Next, consider an extended example of the popular rock-paper-scissors (RPS) game, chosen to show that the geometry of the point cloud formed by the embedded agents can embody the structure of a game. Unlike the Kuhn example, we do not focus on a direct analysis of the mapping from strategy space to disc game space. Instead we aim to show that the point clouds’ shape naturally represents the game. We consider rock-paper-scissor-lizard-spock (RPS + 2). The utility matrix for (RPS + 2) is the circulant matrix generated by the row $[0, - 1, 1, - 1, 1]$ (rock ≻ paper and lizard, scissors and spock ≻ rock).

First, we generate a population of agents using fictitious self play (FSP). Each agent in the population is defined by a vector of length 5 representing a mixed strategy. We start with an initial random agent and generate best response agents using FSP. Each best response agent is added to the population. We then create $F$ by computing the expected value of each match-up using the utility matrix.

What embedding should we expect? All mixed strategies are an interpolation of the five pure strategies, and the game is bilinear in the action probabilities, so the embedding map must be approximately linear. Thus, each mixed strategy should be contained inside the convex hull of a polygon formed by embedding the pure strategies. The utility matrix is invariant under cyclic permutations, so the polygon must be regular and centred at the origin. Therefore, the pure strategies must be the vertices of a regular, centred, pentagon.

The full game includes two, equally important, interlocking cyclic relations among the pure strategies. These are, rock ≻ paper ≻ scissors ≻ lizard ≻ spock ≻ rock, and rock ≻ lizard ≻ paper ≻ spock ≻ scissors ≻ rock, where ≻ denotes loses to. Both cycles contribute equally to the utility matrix as they are equivalent under a relabelling. Namely, the utility matrix is unchanged under a permutation that rearranges the strategy labels so that rock is followed by lizard, then paper, then spock, then scissors.

These two cycles are represented by a pair of disc games, as shown in Figure 3. In both, the pure strategies are vertices of a regular pentagon centred at the origin. The ‘Inferred Graph’ in the rightmost column plots a representative point for each of the pure strategies. Advantage relations that are not accounted for in the first disc game are represented in the second. In the first disc game, the sequence of pure strategies, Rock, Paper, Scissor, Lizard, Spock, are spaced by $144^{°}$ about the pentagon. In the second, the sequential pure strategies are spaced $72^{°}$ apart, so occupy adjacent corners of the pentagon. The sequence of vertices produced by skipping $144^{°}$ per step, or moving $72^{°}$ per step, are related by the same permutation that exchanged the two interlocked advantage cycles defined by the utility matrix. The star shaped disc game has a larger eigenvalue, since $| \sin (144^{°}) | < | \sin (72^{°}) |$ so vertices of the star must be moved farther from the origin to maintain the same cross product as adjacent vertices of the pentagon in the second disc game.

Figure 3.

The two disc games that represent RPS + 2 are shown. Starting from the first column the points are coloured by the label for that column. The graph on the left shows a reduced representation of the game by using representative points for each strategy (red for rock, blue for paper, green for scissors, yellow for lizard, orange for spock).

As in Kuhn poker, the patterns uncovered by PTA repeat for larger RPS games, whose embedded point clouds form regular, centred polygons with as many vertices as pure strategies, as many disc games as cycles in the utility matrix, and whose vertex orders are related by the permutations which exchange cycles in the utility matrix. As before, the embeddings can be stably recovered from a small fraction of $F$ . For example, all three of the RPS + 4 embeddings associated with agents produced via FSP, could be recovered to $\leq 0.4 %$ error after dropping 95% of all entries of $F$ .

Colonel Blotto

Colonel Blotto is a zero-sum, simultaneous action, two-player resource allocation game.⁴⁶ Each player possesses $N$ troops to distribute across $K$ zones. Each zone has an associated payout $Z_{k}$ . A zone is conquered by a player if they allocate more troops to that zone than their opponent. The conquering player receives the payout. Ties result in both players receiving 0 payout. The player with the highest total payout wins the match. All allocations are revealed simultaneously.

At simplest, the payouts are uniform across zones, so the player who conquers the most zones wins the game. Unweighted Blotto is a highly cyclic game since there is no dominating strategy. Every strategy admits a counter. Unless $K = N$ or $K < = 2$ , all allotments lose to some other allotment. To defeat an allotment, adopt the maxim, ‘lose big, win small’. Mimic the allotment, then redistribute all the units from the zone with the most units as uniformly as possible across the remaining zones. Then, unless all zones were allotted one unit, the exploiting strategy sacrifices a loss in one zone to win in more than one other zone. In general, the more an allotment commits to a single zone, the more easily it is defeated. Unweighted Blotto is also complex, since the zones are indistinguishable. Thus unweighted Blotto admits $K!$ fold symmetries with respect to the zone labels.

We consider each unique strategy as a separate ‘agent’, parameterized by the corresponding allotment. We generate agents by randomly sampling over the strategy space using a Dirichlet distribution with the support equal to the number of zones. After sampling, we compare each pair of strategies in the population. Each match-up is deterministic and results in a win, loss or tie, which we assign scores (0.5,0,−0.5). We construct the associated evaluation matrix by setting $F_{ij}$ to the score of strategy $i$ against strategy $j$ .

PTA allows elegant visualization of relevant game structure by reducing a game to a small set of key trade-offs. We start by looking at the $K = 3$ , $N = 75$ blotto game with uniform payouts. Table 2 summarizes the principal trade-offs associated with each disc game. These trade-offs are the most important sources of cycles in the tournament, accounting for 80% of its structure.

Table 2.

Principal trade-offs.

D.G.	Allocation types	Location in simplex	Example	Advantage relation
1	(1) = allocate to 1 zone	Corners	[70, 0, 5]
	(2) = allocate to 2 zones	Midpoints of edges	[38, 37, 0]	(1) ≻ (3) ≻ (2) ≻ (1)
	(3) = allocate to 3 zones	Centre	[25, 25, 25]
2	R′ = [H, M, L]	Corners	[50, 25, 0]
	S′ = [M, L, H]	Shifted	[25, 0, 50]	R ≻ P ≻ S ≻ R
	P′ = [L, H, M]	Counter clockwise	[0, 50, 25]
3	R′ = [L, M, H]	Corners	[0, 25, 50]
	S′ = [H, L, M]	Shifted	[50, 0, 25]	R’≻ P ’≻ S’≻ R’
	P′ = [M, H, L]	Clockwise	[25, 50, 0]

Principal trade-offs associated with the first three disc games. The columns list the allocation types involved in each disc game (D.G.), their location in the simplex of possible allocations, provide an example set of allocations, and the competitive relations between the types. The letters H, M and L, are used to denote high, medium, and low allocation. Note that the allocations involved in the trade-off defined by a disc game correspond to locations in the radius panels in Figure 4 shaded green or yellow. Advantage in a disc game flows clockwise, so can be inferred from the angle panel.

Bolded entries in the example column emphasize zones receiving a majority allocation.

In general, the number of distinct allotments in a $K$ battlefield, $N$ troop blotto game grows at $O (N^{K})$ , but the complexity, which reflects the underlying number of cyclic modes, converges to a constant value associated with a continuous Blotto game, where commanders can allocate an arbitrary fraction of their force to each zone. Unweighted $K = 3$ , $N = 75$ blotto admits 2926 allotments, but has a 3! fold exchange symmetry under permutations of the battlefield labels, leaving roughly 488 distinct allotments. Three disc games reconstruct the evaluation matrix to ≈80% accuracy, 6 to ≈90% accuracy, and 12 to ≥95% accuracy, so the game has complexity 12 at a 95% standard. Trade-offs 4–12 represent refinements of the trade-offs present in the first three disc games, so PTA really allows a reduction in complexity from 2926 allocations (absent prior knowledge regarding symmetries), to three fundamental cyclic modes. Thus, PTA can effectively separate the underlying complexity of a game from the size of its strategy space.

The exchange symmetry of the zones is apparent in the sequence of eigenvalues, $ω_{k}$ , representing disc game importance. Exchanges introduce six permutations under which the evaluation matrix is invariant. Consequently, $ω_{k}$ come in sets of three, where each $ω_{k}$ represents a pair of eigenvectors. Eigenvectors associated with identical eigenvalues are not uniquely defined. Instead, they are drawn from a subspace of dimension equal to the multiplicity of the repeated eigenvalue. Consequently, all of the eigenvectors $Q$ are chosen arbitrarily from six dimensional spaces.

When $F$ has repeated eigenvalues, the associated disc game embeddings are not unique. Any unitary transform of the set of eigenvectors sharing an eigenvalue defines a valid embedding. Thus, symmetry presents an unusual challenge: degeneracy. In our case, the disc games come in sets of three, each representing an arbitrary rotation of a six dimensional object. Consequently, we consider multiple disc games simultaneously. This issue was not addressed in previous work, which largely only considered the leading disc game. Generic games should not exhibit strong symmetries, so such degeneracy will be rare and confined to toy examples. That said, generic games also require more than one disc game, so it is essential to consider more than the leading disc game.

We analyse the three leading disc games to identify the most important allocation trade-offs. Figure 4 shows the first three disc games coloured by rating, allocation to the three zones, and the mapping to angle and radius in each disc game as a function of allocation. Each share the same eigenvalue, so are equally important and could be mixed. Nevertheless, these three disc games represent distinct trade-offs in allocations that can be easily explained.

Figure 4.

Disc games 1–3 of Blotto [1,1,1] game with $N$ = 75. Rows: disc game number. Columns 1–4: disc game embeddings coloured according to agent rating, then agent allocation to zones 1–3. Column 5: the angle (measured counterclockwise from the horizontal axis), of the embedding of each strategy. Advantage flows clockwise in angle, so blue beats purple beats yellow beats green beats blue. Column 6: the radius of the embedding of each strategy. High radius corresponds to strong involvement in a trade-off (yellow), while small radius corresponds to low involvement (blue). Each triangle in the fifth and sixth columns represents the space of available allocations. High allocations to zone 1 cluster near the bottom left corner, high allocations to zone 2 cluster near the bottom right corner, and high allocations to zone three cluster at the top corner. Labels: Representative allocations defining the underlying trade-offs, indicated with bold arrows (see Table 2).

The specific trade-offs can be identified directly from the disc games when coloured by allocation. Consider the points labelled 1, 2 and 3 in the first disc game. Each maximize the radius of the scatter cloud while moving along its boundary, so represent the allocations most involved in the cycle. The low rated points at the bottom of the scatter allocate primarily to one zone (yellow in panels 2–4).

Moving clockwise, the next extrema occurs at the top of the scatter. It is high rated, and has nearly equal allocation across all three zones (coloured green in panels 2–4). Uniform allocations are rated highly since they perform well against most randomly sampled allocations, particularly those lying along a line connecting a corner of the simplex to its centre. This induces a transitive trend among the bulk of the allocations moving from allocations that prioritize one zone, to allocations that treat the zones equally. This transitive trend is represented by the general shift of the disc game leftward off the origin. This subset of allocations compete transitively, producing the regular gradient from purple to yellow in rating when moving clockwise from the bottom to the top in the scatter.

Not all allocations satisfy this transitive trend. Allocations that prioritize two zones counter the uniform strategy, and are countered by allocations that prioritize a single zone. For example, allocation [70,0,5] defeats [38,37,0]. Thus, allocations lying on the midpoints of an edge of the simplex lose to allocations near either neighbouring endpoint. These counters close the cycle, and are represented by the rightmost pair of corners labelled 2 in disc game 1. Panels 2–4 show that each such corner receives an intermediate allocation in two zones (green), but little to none in the third (dark blue).

Similar visual analysis identifies the RPS cycles among cyclic permutations of allocations [H,M,L] and [L,M,H] shown in disc games 2 and 3. For example, the leftmost corner of the scatter cloud shown in disc game 2 receives a high allocation in zone 1 (teal), an intermediate allocation in zone 2 (blue-green), and a low allocation in zone 3 (dark blue). Walking from R to P to S, the allocation patterns shifts cyclically. The same analysis applies to disc game 3, starting from [L,M,H].

Figure 5 shows the angle and radius assigned to each allocation in the simplex. Strikingly, subsequent disc games imitate the disc game 1–3 trade-offs, only at higher frequency on a smaller spatial scale in allocation. This suggests that the disc games may act like Fourier modes, where early disc games capture low frequency, global trade-offs, and later disc games capture high frequency, local trade-offs. It also suggests that orthogonality may not be the appropriate notion of independence for trade-offs. A sharper notion of equivalency is needed. Methods like nonnegative matrix factorization, which address similar issues among PCA features,⁴⁷ suggest an avenue for further development. An example that produces explicit sine series is discussed in the Appendix.

Figure 5.

The first nine disc games of the $N = 75$ , [1,1,1] blotto game. Each column is a separate disc game. The first and second rows show the angle and radius assigned to each allocation. The disc games are labelled by trade-off type. Consecutive sets of three share the same eigenvalue and are grouped by spatial scale with eigenvalue and percent recovery of $F$ provided beneath.

Pokemon

We conclude by analysing Pokemon. Pokemon originated from the Nintendo Game Boy console, but has since been played on a variety of mediums including playing cards. Pokemon is of considerable interest from a game design perspective since the creators must design certain trade-offs to keep the game balanced and engaging. The game is made up of creatures, called Pokemon, which come in many varieties. Players are rewarded for collecting diverse teams. Thus, each Pokemon has a different type, and each type has its own set of strengths and weaknesses. These different types satisfy interlocking cyclic relationships.

The data used in this analysis comes from an open-source Kaggle data set.⁴⁸ The original data has 800 Pokemon, but we removed the 65 ‘legendary’ Pokemon to simplify the analysis. The data consists of battle outcomes and Pokemon attributes. Battle outcomes were converted into an evaluation matrix by logistic regression (see Appendix).

Figure 6 shows three of the first four disc games, chosen for their significance. The first disc game is the most important, and is clearly transitive since all points fall on a curve that does not enclose the origin. Position along the curve is closely correlated with speed, so speed determines rating. We query by attribute to interpret the remaining disc games. To start, consider the ‘type’ attribute. The second disc game is clearly clustered by type (see Figure 6). A variety of RPS relationships are apparent among the type clusters. Any loop of clusters containing the origin corresponds to a cycle of type advantage. The intensity of the corresponding cycle (curl) is proportional to its area. Focus on the large clusters most involved in the trade-off, that is, furthest from the origin. Figure 7 summarizes the RPS relations between these clusters. First, notice the highlighted triangle formed by the Water-Fire-Grass clusters. The disc game shows the expected advantage cycle since the triangle contains the origin. Thus, PTA identifies known game structures without domain-specific knowledge.

Figure 6.

Disc games 1, 2 and 4 for pokemon. Disc game one is coloured by rating. Disc game 2 is coloured by type, then generation. Disc game four is coloured by rating, then generation.

Figure 7.

Left: RPS sub-game discovery. Each cluster type is represented by a matching pokemon, Middle: Empirical performance matrix, Right: Performance matrix derived from Pokemon Database.⁴⁹

Additional clusters on the outer ring are more intricately related. The other three types are ‘bug’, ‘rock’ and ‘ground’. To summarize these relations we construct a coarse-grained evaluation matrix, $\hat{F}$ . Specifically, ${\hat{F}}_{ij}$ is the average performance of Pokemon from type $i$ versus the Pokemon from type $j$ in the second disc game. The associated matrix heat map is shown in the middle panel of Figure 7. The types are ordered by angle moving clockwise about the origin.

We compared these relations with available game design matrices known as ‘attack matrices’, which list the advantage of one Pokemon type over the other. We use the attack matrix from Pokemon Database.⁴⁹ An attack matrix is written in terms of multiples, so Pokemon that are evenly matched have a $1 \times$ advantage. We bucket the range of $i, j$ attack multipliers $a_{ij}$ into five bins ranging from $0 \times$ to $2 \times$ , skew-symmetrize via $(A - A^{T})$ . The result is the rightmost panel in Figure 7.

The coarse grained summary $\hat{F}$ is strikingly similar to the provided attack matrix. The apparent structural parity in these two matrices highlights the virtues of PTA. Without any domain knowledge, access to attributes, or any explicit instruction to identify clusters, PTA clustered Pokemon by their most relevant attributes (type) then encoded a game mechanism (type specific attack multipliers) directly from the cluster locations. Conversely, the second disc game shows how cyclic relations introduced at the mechanism level are realized in actual performance.

Colouring the disc games by ‘generation’, that is, pokemon release date, reveals design choices. The game is frequently updated by the addition of new Pokemon. Updates present a design challenge. Game designers must introduce desirable new Pokemon without upsetting the game balance. The fourth disc game, shown in the far right plot of Figure 6, is balanced in that rating does not predict angle, and instead correlates with radius. Strong and weak Pokemon are closer to the origin, while Pokemon of intermediate rating are more involved in the trade-off. This reveals a spinning top structure characteristic of many games.²⁶ Rather, generation predicts angle. Each generation possesses an advantage over its predecessor, as illustrated by the fade from purple to yellow. Balance is retained since generational advantage is periodic. The same clockwise generation shift reappears in the second disc game. Within type, new beats old. For example, the bottom-most cluster (grass) clearly trends old to young. Cross type relations are largely unchanged.

Unlike Kuhn poker or extended RPS, the Pokemon embeddings require most of $F$ to recover accurately since the embeddings are associated with similar eigenvalues. The first embedding can be recovered to ≤5% accuracy with 90% of $F$ dropped, but the remaining embeddings recording type and generational advantages can only be recovered to 10% and 18% relative error when 70% of the entries are dropped.

Conclusion

Following Balduzzi et al.,¹⁵ we have demonstrated that all evaluation matrices admit an expansion onto a sum of disc game embeddings. We suggest the name PTA based on the close analogy with PCA. Through examples, we have demonstrated that embeddings produced by PTA can reveal a surprising variety of competitive structures from incomplete outcome data alone through direct visual inspection. Future work could provide more general methods for finding embeddings, such as a functional theory connecting performance with attribute space either, or could seek a sparser representation via extensions of sparse PCA.⁵⁰ Future work should also investigate automated methods that summarize the trade-offs identified by PTA without the need for visual inspection, and that leverage the representation for game classification, construction, and exploration.

Supplemental Material

sj-pdf-1-ivi-10.1177_14738716241239018 – Supplemental material for Principal trade-off analysis

Supplemental material, sj-pdf-1-ivi-10.1177_14738716241239018 for Principal trade-off analysis by Alexander Strang, David Sewell, Alexander Kim, Kevin Alcedo and David Rosenbluth in Information Visualization

Footnotes

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Defense Advanced Research Projects Agency grant no. HR00112090075.

ORCID iD

Alexander Strang

Supplemental material

Supplemental material for this article is available online.

References

Silver

Hubert

Schrittwieser

, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 2018; 362(6419): 1140–1144.

Heinrich

Silver

. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:160301121, 2016.

Moravčík

Schmid

Burch

, et al. Deepstack: expert-level artificial intelligence in heads-up no-limit poker. Science 2017; 356(6337): 508–513.

Vinyals

Babuschkin

Czarnecki

, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 2019; 575(7782): 350–354.

Silver

Hubert

Schrittwieser

, et al. AlphaZero: Shedding new light on the grand games of chess, shogi and Go. DeepMind blog, 2018, https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-chess-shogi-and-go/

Omidshafiei

Tuyls

Czarnecki

, et al. Navigating the landscape of multiplayer games. Nat Commun 2020; 11(1): 5603–5617.

Clune

. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:190510985, 2019.

Lewis

. Moneyball: the art of winning an unfair game. New York, NY: W.W. Norton and Company, 2004.

Bozóki

Csató

Temesi

. An application of incomplete pairwise comparison matrices for ranking top tennis players. Eur J Oper Res 2016; 248(1): 211–218.

10.

Laird

Schamp

. Competitive intransitivity promotes species coexistence. Am Nat 2006; 168(2): 182–193.

11.

Silk

. Male bonnet macaques use information about third-party rank relationships to recruit allies. Anim Behav 1999; 58(1): 45–51.

12.

Stuart-Fox

Firth

Moussalli

, et al. Multiple signals in chameleon contests: Designing and analysing animal contests as a tournament. Anim Behav 2006; 71(6): 1263–1271.

13.

Sinervo

Lively

. The rock–paper–scissors game and the evolution of alternative male strategies. Nature 1996; 380(6571): 240–243.

14.

Tuyls

Perolat

Lanctot

, et al. A generalised method for empirical game theoretic analysis. arXiv preprint arXiv:180306376, 2018.

15.

Balduzzi

Tuyls

Perolat

, et al. Re-evaluating evaluation. In: Bengio

Wallach

Larochelle

, et al. (eds) Advances in neural information processing systems, Vol. 31. Curran Associates, Inc., 2018. https://proceedings.neurips.cc/paper_files/paper/2018/file/cdf1035c34ec380218a8cc9a43d438f9-Paper.pdf

16.

Pearson

. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dubl Philos Mag J Sci 1901; 2(11): 559–572.

17.

Bertrand

Czarnecki

Gidel

. On the limitations of the elo, real-world games are transitive, not additive. In: International conference on artificial intelligence and statistics. PMLR, pp.2905–2921.

18.

Candogan

Menache

Ozdaglar

, et al. Flows and decompositions of games: Harmonic and potential games. Math Oper Res 2011; 36(3): 474–503.

19.

Strang

Abbott

Thomas

. The network HHD: quantifying cyclic competition in trait-performance models of tournaments. SIAM Rev 2022; 64(2): 360–391.

20.

Linares

. Are inconsistent decisions better? An experiment with pairwise comparisons. Eur J Oper Res 2009; 193(2): 492–498.

21.

May

. Intransitivity, utility, and the aggregation of preference patterns. Econometrica 1954; 22: 1–13.

22.

Bilmes

Meila

. Intransitive likelihood-ratio classifiers. In: Dietterich

Becker

Ghahramani

(eds) Advances in neural information processing systems, Vol. 14. MIT Press, 2001. https://proceedings.neurips.cc/paper_files/paper/2001/file/a088ea2078cd92b0b8a0e78a32c5c082-Paper.pdf

23.

Huang

Weng

Lin

, et al. Generalized bradley-terry models and multi-class probability estimates. J Mach Learn Res 2006; 7(1): 85–115.

24.

Balduzzi

Racaniere

Martens

, et al. The mechanics of n-player differentiable games. In: International conference on machine learning. PMLR, pp.354–363.

25.

Lanctot

Schultz

Burch

, et al. Population-based evaluation in repeated rock-paper-scissors as a benchmark for multiagent reinforcement learning. arXiv preprint arXiv:230303196, 2023.

26.

Czarnecki

Gidel

Tracey

, et al. Real world games look like spinning tops. In: Larochelle

Ranzato

Hadsell

, et al. (eds) Advances in neural information processing systems, Vol. 33. Curran Associates, Inc., 2020, pp.17443–17454. https://proceedings.neurips.cc/paper_files/paper/2020/file/ca172e964907a97d5ebd876bfdd4adbd-Paper.pdf

27.

Balduzzi

Garnelo

Bachrach

, et al. Open-ended learning in symmetric zero-sum games. In: International conference on machine learning. PMLR, pp.434–443.

28.

Garnelo

Czarnecki

Liu

, et al. Pick your battles: Interaction graphs as population-level objectives for strategic diversity. arXiv preprint arXiv:211004041, 2021.

29.

Healy

. Data visualization: a practical introduction. Princeton, NJ: Princeton University Press, 2018.

30.

Lim

. Hodge laplacians on graphs. SIAM Rev 2020; 62(3): 685–715.

31.

Youla

. A normal form for a matrix under the unitary congruence group. Can J Math 1961; 13: 694–704.

32.

Zumino

. Normal forms of complex matrices. J Phys Math 1962; 3(5): 1055–1057.

33.

Eckart

Young

. The approximation of one matrix by another of lower rank. Psychometrika 1936; 1(3): 211–218.

34.

Mirsky

. Symmetric gauge functions and unitarily invariant norms. Q J Math 1960; 11(1): 50–59.

35.

Strang

. Linear algebra and learning from data. Vol. 4. Cambridge: Wellesley-Cambridge Press, 2019.

36.

Lipton

Markakis

Mehta

. Playing large games using simple strategies. In: Proceedings of the 4th ACM conference on electronic commerce, pp.36–41.

37.

Udell

Townsend

. Why are big data matrices approximately low rank? SIAM J Math Data Sci 2019; 1(1): 144–160.

38.

Golub

Van Loan

. Matrix computations. Baltimore: John Hopkins University Press, 2013.

39.

Meka

Jain

Dhillon

. Guaranteed rank minimization via singular value projection. arXiv preprint arXiv:09095457, 2009.

40.

Gleich

Lim

. Rank aggregation via nuclear norm minimization. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp.60–68.

41.

Chen

Joachims

. Modeling intransitivity in matchup and comparison data. In: Proceedings of the ninth ACM international conference on web search and data mining, pp.227–236.

42.

Chen

. Nonconvex matrix completion with linearly parameterized factors. J Mach Learn Res 2022; 23(207): 1–35.

43.

Kuhn

. A simplified two-person poker. In: Kuhn

Tucker

(eds) Contributions to the theory of games (AM-24), Vol. 1. Princeton: Princeton University Press, 1951, pp.97–104. https://doi.org/10.1515/9781400881727-010

44.

Lanctot

Lockhart

Lespiau

, et al. OpenSpiel: a framework for reinforcement learning in games. CoRR 2019; abs/1908.09453. https://doi.org/10.48550/arXiv.1908.09453; https://arxiv.org/abs/1908.09453

45.

Strang

SeWell

Rosenbluth

, et al. Quadratic competition models for similar competitors: an analysis. arXiv, 2022.

46.

Kovenock

Roberson

. Generalizations of the general lotto and colonel blotto games. Econ Theory 2021; 71(3): 997–1032.

47.

Lee

Seung

. Learning the parts of objects by non-negative matrix factorization. Nature 1999; 401(6755): 788–791.

48.

Bouchet

. Pokemon battles, https://www.kaggle.com/jonathanbouchet/pokemon-battles/data (2017, accessed 21 October 2021).

49.

Pokemon Database. Pokemon type chart, https://pokemondb.net/type (2022, accessed 20 January 2022).

50.

Zou

Hastie

Tibshirani

. Sparse principal component analysis. J Comput Graph Stat 2006; 15(2): 265–286.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.96 MB