Abstract
We consider dynamic pricing and demand learning in a duopoly with multinomial logit demand, both from the perspective where firms compete against each other and from the perspective where firms aim to collude to increase revenues. We show that jointârevenue maximization is not always beneficial to both firms compared to the Nash equilibrium, and show that several other axiomatic notions of collusion can be constructed that are always beneficial to both firms and a threat to consumer welfare. Next, we construct a price algorithm and prove that it learns to charge supraâcompetitive prices if deployed by both firms, and learns to respond optimally against a class of competitive algorithms. Our algorithm includes a mechanism to infer demand observations from the competitor's price path, so that our algorithm can operate in a setting where prices are public but demand is private information. Our work contributes to the understanding of wellâperforming price policies in a competitive multiâagent setting, and shows that collusion by algorithms is possible and deserves the attention of lawmakers and competition policy regulators.
INTRODUCTION
Background and motivation
The mathematical analysis of dynamic price algorithms in the presence of demand learning is at the forefront of contemporary research, undoubtedly inspired by the significant potential for implementation in practice. A vast and rapidly increasing stream of literature analyzes such price algorithms (see the references discussed in Subsection 1.3). Most of these settings are monopolistic, and do not take the intricacies of pricing and learning in the presence of competitors into account. This motivates the current work, in which we study pricing and learning in a duopoly.
An important aspect of duopoly pricing is the absence of an unequivocal notion of what it means to price optimally. In most monopolistic models, there exists a unique optimal price if model parameters remain fixed throughout the time horizon. For duopolies, however, there exists a spectrum of equilibria, with Nash equilibria on one side and supraâcompetitive equilibria on the other side. In this paper, we consider price policies for both sides of the spectrum.
The potential convergence of price algorithms to supraâcompetitive equilibria has recently drawn the attention of competition regulators worldwide (Hoffman, 2018; OECD, 2017b; Spiridonova & Juchnevicius, 2020). In particular, the question has emerged whether price algorithms are capable of learning supraâcompetitive prices from data (specifically in the scenario where multiple firms in a market use the same price algorithm), within the boundaries of existing competition law. Among other things, this means that the algorithms cannot make use of illicit forms of signaling or communication. Some argue that algorithmic collusion poses a fundamental threat to consumer welfare (Ezrachi & Stucke, 2016, 2017a, 2017b, 2017c, 2020), whereas others believe that algorithmic collusion belongs to the realm of âscience fictionâ (KĂźhn & Tadelis, 2017; Schrepel, 2017; Schwalbe, 2018; Veljanovski, 2020). If algorithmic collusion is possible, then the existing legal framework of competition law might need substantial revision or reâinterpretation (Ezrachi & Stucke, 2016, 2017a, 2017b, 2017c, 2020; Gal, 2019; Harrington, 2018; OECD, 2017b).
A challenge in achieving algorithmic collusion is the fact that, without communication, there might be ambiguity in the firms' understanding of what it means to collude. But as shown by Meylahn and den Boer (2022), a definition of collusion can be built into the algorithm; an âalgorithmic agreement,â as it were (cf. Gal, 2019, p. 78). This is a powerful tool for creating stable cartels under mutual use of the algorithm, as it prevents bargaining and deviation issues that often result in the collapse of collusion (Bos & Harrington, 2010, pp. 107â108).
A second challenge is the fact that collusive prices usually depend on parameters that are unknown and thus have to be learned from sales data. This task is further complicated by the fact that prices are often publicly observable, but sales transactions are not: firms typically do not have access to the sales data of their competitor (see Meylahn and den Boer (2022) for exceptions). This hinders estimation of the notion of collusion implemented in the algorithm. It is therefore necessary to study if and how algorithms can overcome the challenge of private demand observations.
A third challenge that arises when hardâcoding a notion of collusion into a price algorithm is defining an appropriate collusive solution. Collusion is often associated with jointârevenue maximization (Gal, 2019; OECD, 2017b; Schwalbe, 2018), but this notion may not perform well for individual firms in asymmetric settings without side payments (Fischer & Normann, 2019). In such cases, algorithmic collusion requires a notion of collusion that is always beneficial to all firms.
The purpose of this paper is to address these three challenges in the context of a pricing duopoly. We show that several axiomatic notions of collusion can be constructed that are beneficial to both firms and detrimental to consumer welfare. Next, we construct an algorithm that learns the collusive price when deployed by both players in the market and learns to respond optimally against a class of competitive algorithms. We furthermore propose a mechanism to overcome the challenge of private demand observations by exploiting the information that is conveyed by the algorithm's price decisions. This algorithm can generate supraâcompetitive prices without breaking existing competition law, and therefore shows that the concerns of competition regulators are justified.
Legal environment
We consider a duopoly where both firms independently adopt the same price algorithm. In the absence of communication, the firms should not be in violation of the law even if mutual adoption of the algorithm leads to supraâcompetitive outcomes. This is in accordance with findings of the OECD (2017a, p. 10): âAbsent concerted action, independent adoption of the same or similar pricing algorithms is unlikely to lead to antitrust liability even if it makes interdependent pricing more likely.â Ezrachi and Stucke (2020, p. 220) derive similar conclusions. Along these lines, Harrington (2018, pp. 339â340) makes a very clear distinction between collusion as an outcome and the means by which the outcome is achieved: âSupracompetitive prices are legal. Collusion is legal. It is the process by which firms achieve collusion that is illegal, rather than the state of collusion itself. [âŚ] At a minimum, there must be evidence of communication to allow a court to conclude that firms have not acted independently. Though all forms of collusion are harmful, some forms are legal because of the inability to effectively distinguish collusive conduct from competitive conductâ.
If competing firms purchase the algorithm from the same software supplier who boasts the collusive capabilities of the algorithm, then the supplier runs the risk of being seen as a âmiddle manâ who facilitates collusion and could potentially be held liable. The OECD (2019) has collected many realâlife cases of this type of collusion, soâcalled hubâandâspoke collusion, where coordination does not occur through direct exchanges between the horizontal competitors (the âspokesâ), but through exchanges via a vertically related supplier or retailer (the âhubâ). Of particular interest is the case of Partneo (see, e.g., Mandrescu, 2018): between 2008 and 2013 five carmakers colluded by using the same car part pricing software, Partneo, from the company Accenture. They boosted their revenues by more than 1 billion dollars thanks to using Partneo, and yet âno formal findings [âŚ] have been found against the carmakers or Accentureâ (Ezrachi & Stucke, 2020, p. 220). The OECD (2019, p. 2) states that â[t]he main challenge for enforcement agencies is to identify when inherently legitimate exchanges between suppliers and retailers turn into a prohibited horizontal coâordination, without direct proof of collusion. Enforcement and jurisprudence in particular in the United States, but also in Europe, [âŚ] require proof of a horizontal connection between the spokes, a rim, and an awareness of all actors involvedâ. In the scenario that we consider in this paper, there are no exchanges between the competitors via the software supplier. This means that a ârimâ (a connection between competitors) can only be established by the workings of the algorithm itself. In Section EC.3 of the Supporting Information, we explain that it is inevitable that prices generated by an algorithm contain and transfer information. This information transfer is an âaccidentalâ and unavoidable byâproduct of the generated price decisions that have legitimate alternative justifications. This further suggests that users (and possibly distributors) of the algorithm do not violate competition laws.
Although it is not settled whether a software supplier can sell the same algorithm to competing firms without violating laws, this problem does not arise in other scenarios. Consider the situation of multiple thirdâparty sellers who sell their products on an online platform. Suppose, in addition, that the computer code of a collusive algorithm has been made available on an online software repository. The collusive capabilities of the algorithm are not mentioned; all that gradually becomes known through wordâofâmouth rumors is that the algorithm appears to increase profits substantially. It is conceivable that in this situation multiple sellers start using the algorithm, without engaging in illicit communication with each other or with a âmiddle man.â This scenario becomes increasingly likely due to the rising popularity of platforms that connect thirdâparty sellers to consumers, and of openâsource software repositories. There are many software suppliers who offer repricing algorithms to thirdâparty sellers on various platforms, and generally charge a fee that increases with the number of listings. There could thus be much interest in freeâtoâdownload pricing software.
We emphasize that we do not and cannot claim with certainty that the algorithm proposed in this paper (or the distribution thereof) is legal, because that can only be decided in court. For the same reason it is also not clear that the algorithm is illegal. Regulators are wellâaware that algorithms can facilitate tacit collusion by reducing or eliminating the need for human communication (OECD, 2017b, p. 7), and that signaling between algorithms can occur through prices: âFor instance, firms may program snapshot price changes during the middle of the night, which won't have any impact on sales but may be identified as a signal by rivals' algorithms. Or, alternatively, companies may use algorithms to publicly disclose a lot of detailed data that is used as a code to propose and negotiate price increases, as it was observed in the US case on Airline Tariff Publishing Companyâ (OECD, 2017b, p. 30). An important insight of this paper is that information can be transferred between algorithms in much subtler ways. Indeed, we concretely show that prices generated by algorithms reveal information that can be used to achieve supraâcompetitive outcomes, not because extraneous information is explicitly encoded in the prices, but because legitimate price decisions inevitably allow decoding of information by a copy of the same algorithm. This creates a challenge for lawmakers: information transfer through prices is inevitable, but it is far from trivial to determine where to draw the line.
Related literature
Recently, a large number of studies have appeared on dynamic pricing with demand learning (Besbes & Zeevi, 2009, 2015; Broder & Rusmevichientong, 2012; Chen et al., 2019, 2021; Cheung et al., 2017; den Boer & Zwart, 2014, 2015; Hansen et al., 2021; Harrison et al., 2012; Keskin & Zeevi, 2014; Wang et al., 2014, 2021). We refer to den Boer (2015) for a recent survey. Although these papers consider an impressive number of models and settings, the formal analysis of price algorithms with demand learning in the presence of competitors has received relatively little attention in this stream of literature. Exceptions include Bertsimas and Perakis (2006), who formulate an optimal pricing problem with incomplete information in a dynamic programming framework and propose tractable approximations; Cooper et al. (2015), who study the limit behavior of prices in a duopoly where both players use a misâspecified passive learning policy; and Yang et al. (2020), who study convergence to equilibrium prices under the assumption that all players in the market use the same algorithm. Our paper contributes to this stream of literature by proposing price policies and formally proving regret bounds in a duopoly, both in a competitive and in a collusive setting.
Our work contributes to the nascent field of algorithmic collusion. A key question in this field is up to what extent algorithms are capable of learning supraâcompetitive prices under selfâplay while simultaneously learning to price competitively against reasonable alternative algorithms. Recent contributions include Calvano et al. (2020), Cooper et al. (2015), and Klein (2021), where we remark that the focus of the second paper is not algorithmic collusion per se but rather the interplay of learning and model misâspecification. These studies show by means of simulations that particular price policies, when playing against copies of itself, can lead to supraâcompetitive limit prices and revenues with positive probability. However, as illustrated by figures 6 and 7 of Cooper et al. (2015), subâcompetitive prices and revenues are also possible outcomes. In addition, these papers do not provide performance guarantees in case these policies do not play against a copy of itself but against a âreasonableâ alternative strategy. These two observations diminish the likelihood of such algorithms being implemented in practice (den Boer et al., 2022).
Two examples where the notion of collusion is explicitly programmed in the algorithm are Meylahn and den Boer (2022) and Aouad and den Boer (2021). Meylahn and den Boer (2022) explain that preâprogrammed collusion does not make an algorithm illegal. The authors consider a pricing duopoly, define collusion as maximizing the jointârevenue function, and construct a policy that learns this collusive price provided this is mutually beneficial to the firms and both firms use the algorithm. In addition, it is shown that prices converge to a best response in case the opponent prices according to a reaction function. Aouad and den Boer (2021) consider similar questions in an assortmentâoptimization duopoly with multinomial logit demand.
A limitation of Meylahn and den Boer (2022) is that collusion is only guaranteed if the jointârevenueâmaximizing prices are mutually beneficial compared to pricing under the Nash equilibrium. In practice this is often not the case if the firms are not symmetric (Bos & Harrington, 2010; Fischer & Normann, 2019). A second limitation is that demand observations are assumed to be public information. These limitations might give lawmakers the wrong impression that algorithmic collusion is only possible in markets where these two limitations do not apply. In contrast, in this work we show and explicitly construct notions of collusion that firms can adopt that are always mutually beneficial. Hardâcoding these cartel prices in algorithms enables collusion in many more markets than only (nearâ)symmetric ones. In addition, we show how the challenge of private demand information can be overcome by inferring demand data from public price data.
This paper also contributes to the axiomatic bargaining literature by elaborating how firms can collude on price in a pricing duopoly with multinomial logit demand. To the best of our knowledge, there appears to be no existing work that conducts this analysis. We take an axiomatic approach in the spirit of Roth (1979). Fischer and Normann (2019) study similar concepts for Cournot games.
Contributions and insights
The technical contributions of this paper are threefold. First, we analyze in detail three notions of collusion in a pricing duopoly with multinomial logit demand functions, based on different sets of axioms such as Pareto optimality and equal relative/absolute gains. We prove existence and uniqueness under different combinations of axioms, and show how these collusive prices can be computed up to arbitrary precision. We show that these collusive prices reduce consumer welfare and increase prices and revenues of both firms compared to the Nash equilibrium. Simulation results with parameter values based on empirical data show that these effects can be substantial.
Second, we construct a price algorithm and prove that prices converge to the implemented collusive solution if the algorithm is used by both players. Importantly, the players do not have to start the algorithm at the same moment, demand observations are allowed to be private information, and the players do not have to know that the opponent is using the same algorithm. If the algorithm plays against a competitive price rule, we prove that prices converge to a best response and provide asymptotically optimal regret bounds.
Third, we show how a competitor's private demand realizations can be inferred from public prices. This facilitates consistent estimation of the unknown model parameters and has the additional benefit that the players, under mutual use of our proposed algorithm, always base their decisions on the same sales data set. This enables players to immediately detect when the other firm deviates from the algorithm's price suggestions, strengthening the stability of the cartel.
Our technical results have several implications for competition regulation professionals. First, we show that two major challenges of firms to achieve algorithmic collusion can be overcome when they use the same algorithm. Hardâcoding a notion of collusion in an algorithm removes the need to communicate and agree on what it means to collude, which, according to Bos and Harrington (2010), is a critically difficult step in the formation of cartels. Furthermore, using the same algorithm enables the firms to âunderstandâ each other's price decisions, so that, for example, price experiments aimed at guaranteeing learning the model parameters will not mistakenly be interpreted as âcheatingâ the cartel. In fact, deviation from the prices prescribed by the algorithm will immediately be detected by the competing firm, which strengthens the stability of the cartel.
Second, our examples of collusive solutions demonstrate that firms can be creative in what they mean by collusion, and can adopt definitions that are not necessarily on the radar of a regulator. Competition regulators should therefore not blindly focus on one particular notion of collusion (e.g., jointârevenue maximization) but be on the lookout for any sustainable price pair that reduces consumer welfare and is beneficial for the firms.
Third, our mechanism to infer sales data from price paths reveals an important point related to communication: price paths contain and transfer information. Illicit communication between firms to establish collusion is forbidden by law (Harrington, 2018), but the fact that prices may signal information about, for example, a seller's parameter estimates or beliefs cannot always be avoided, and therefore does not constitute illegal communication per se. In our algorithm, information transfer is an âaccidentalâ byâproduct of the generated price decisions, which is used to overcome the challenge of private sales data and to learn the collusive price. Competition regulators may wish to reflect on whether this information transfer is a desirable property of prices generated by algorithms.
Organization of the paper
The rest of this paper is organized as follows. Subsection 2.1 describes the static pricing game under consideration. In Subsection 2.2, we consider several axiomatic notions of collusion, and prove results on existence and uniqueness. Subsection 2.3 provides comparisons with Nash prices, revenues, and consumer welfare. Subsection 3.1 extends the static game to a dynamic setting in which the demand functions are unknown upfront, and defines the concept of a price policy. Subsection 3.2 describes the algorithm that we propose. In Section 4, we analyze its performance against different types of competitors. Subsection 4.1 establishes that prices converge to the builtâin collusive solution when the algorithm is used by both firms in the duopoly, and provides a bound for the corresponding regret. In Subsection 4.2, we study the performance of our algorithm when pricing against a âtrueâ competitor, and prove an asymptotically optimal regret bound. Subsection 4.3 provides numerical illustrations. A discussion and directions for future research are contained in Section 5. All mathematical proofs are collected in the Supporting Information, where we also discuss legal aspects of the proposed algorithm (Section EC.3 in the Supporting Information), consider the performance of alternative price rules against our policy (Section EC.4 in the Supporting Information), and explain that cheating the algorithm is difficult (Section EC.5 in the Supporting Information).
COLLUSION IN A STATIC PRICING DUOPOLY
In this section, we describe different definitions of collusion that firms could devise and implement in a price algorithm. Subsection 2.1 describes the static pricing game under consideration. Subsection 2.2 establishes that jointârevenue maximization, a standard notion of collusion, is often not beneficial to both firms and therefore not always a suitable definition of collusion. We discuss alternative notions of collusion based on different axioms and prove that these are always beneficial to both firms and always diminish consumer welfare. Subsection 2.3 provides numerical results.
Static model
We consider a priceâsetting duopoly where each firm sells exactly one product; firm one sells product one, firm two sells product two. The price charged for product
The exponential
For future reference, we now define the optimal monopoly prices (i.e., when there is no competitor), the Nash equilibrium prices, and the maximizer of the jointârevenue function. Regarding the monopoly prices, Li and Huh (2011, Theorem 2 and Corollary 1) show that if firm j is a monopolistâwhich is equivalent to setting the competitor's price to infinityâthere exists a unique revenueâmaximizing price
Li and Huh (2011, Theorem 4) also show that the static pricing game has a unique Nash equilibrium
Finally, Li and Huh (2011, Theorem 2) show that the jointârevenue function
We conclude this section with the following remarks regarding notation. For each
Notions of collusion
Jointârevenue maximization
Collusion is frequently associated withâor even defined asâjointârevenue maximization (Gal, 2019, p. 71, OECD, 2017b, p. 19, Schwalbe, 2018, p. 569). However, the concepts of collusion and jointârevenue maximization are fundamentally different: from the perspective of an individual firm, the objective of collusion is to raise its own revenue to a supraâcompetitive level, not to maximize joint revenue per se. Although maximizing joint revenue can sometimes increase revenues for all involved firms, this need not always be the case.
To illustrate this point, we generate 1 million instances of
Performance results of four different notions of collusion compared to the Nash equilibrium
This shows that in many cases (asymmetric) firms that are in principle willing to collude are unlikely to act on it if collusion is defined by charging jointârevenueâmaximizing prices. This does not mean that such firms cannot collude: in what follows we show that there are alternative ways to define cartel prices that are always profitable for both firms.
Axiomatic notions of collusion
We refer to any function For all
For all Pareto optimality
For all Equal relative gains
For all Equal absolute gains
Mutually beneficial notions of collusion provide strictly higher revenue than the Nash equilibrium, for each firm, regardless of the model parameters (in contrast to, for example, jointârevenue maximization). Pareto optimality ensures that no additional revenue can be gained without harming the other firm. The axioms of equal relative/absolute gains prevent imbalances in the collusive surplus compared to the Nash equilibrium, thereby making the notion of collusion more mutually attractive.
In this section, we consider three notions of collusion that firms could construct. Two of these definitions are induced by taking different combinations of the axioms given above: the first notion of collusion is defined by Axioms 2 and 3, the second notion by Axioms 2 and 4. The axiomatic derivation of the third notion of collusion, the Nash bargaining solution, is given in Section EC.2 in the Supporting Information because the underlying axioms are based on a broader framework. All three notions of collusion under consideration are mutually beneficial (Axiom 1) and Pareto optimal (Axiom 2).
Before we analyze these notions of collusion, we prove a technical result about Pareto optimal price pairs; a price pair Let
We now consider the three aforementioned axiomatic notions of collusion in detail. Our first theorem shows that the axioms of Pareto optimality and equal relative gains uniquely define a notion of collusion, denoted by There exists a unique notion of collusion
To prove the theorem, we deduce from Lemma 1 that the revenue of firm j strictly decreases from the optimal monopoly value to zero along the graph of the function defined in (7), whereas the opponent's revenue strictly increases in the reversed direction. Consequently, the function defined in (8) strictly decreases from a positive value to a negative value. This guarantees the existence of a unique zero, which in turn ensures existence and uniqueness of a Pareto optimal notion of collusion that provides equal relative gains. Because the Nash equilibrium is not Pareto optimal it follows that
Our next result shows that the axioms of Pareto optimality and equal absolute gains uniquely define a notion of collusion, denoted by There exists a unique notion of collusion
The third and final notion of collusion that we consider is the Nash bargaining solution, denoted by Let
To prove Theorem 3, we first establish that the revenue functions
Effects of collusion on prices, revenues, and consumer welfare
We measure consumer welfare (CW) by the expected postâpurchase utility obtained by the consumer: Let
In the proof of Theorem 4, we derive that
To obtain more insight into the effects of collusion on prices, revenues, and consumer welfare, we now perform a simulation. We generate 1 million instances of
The results from Table 1 show that RPO, APO, and NB have roughly equivalent effects on prices, revenues, and consumer welfare, whereas JRM performs rather differently. First, JRM is mutually profitable compared to the Nash equilibrium in only 19% of the cases; this is in stark contrast with the 100% for the other collusive solutions. Second, the average effect on revenues is 0%; this is because JRM is not a mutually beneficial notion of collusion, so that revenues can in fact decrease by switching from the Nash equilibrium to JRM. Third, the average effect of JRM on prices is much more extreme than for the other notions of collusion, but because these enormous price increases generally do not occur for both firms simultaneously, consumer welfare is affected by roughly the same amount as the other notions of collusion.
COLLUSION AND COMPETITION IN A DYNAMIC PRICING DUOPOLY
We now consider a sequential version of the pricing game where the parameters are initially unknown to the firms. Subsection 3.1 extends the model from Section 2 to this dynamic setting and provides the definition of a policy. In Subsection 3.2, we propose an algorithm called ColludeâorâCompete that guarantees convergence to a builtâin notion of collusion when used by both players, and performs optimally against a class of alternative policies that a competitor might deploy.
Dynamic model
We consider a dynamic version of the static duopoly game introduced in Subsection 2.1. Although the firms can in principle change their selling prices at any instant, we divide the time horizon into periods and model the firms' pricing behavior as a discreteâtime simultaneousâmove game, as is common in the literature (see, e.g., Akçay et al., 2010, Cooper et al., 2015, Li & Huh, 2011, Maglaras & Meissner, 2006). This means that we divide the time horizon [0, â) into subsequent periods
Let
Let
Colludeâorâcompete
We propose a dynamic price algorithm called ColludeâorâCompete, which by design is able to switch between two policies, or modules. One of these policies (the collusive policy) is designed to guarantee convergence of prices to the collusive solution
Collusive policy
The collusive policy is based on ideas from den Boer and Zwart (2014). The key idea is to iteratively approximate
In particular, if firm j activates the collusive policy at time Let
To ensure consistency of the parameter estimates, the policy imposes for every The values of Îş0 and Îş1 are hardâcoded into the algorithm by the software developer, cannot be altered by the user, and are the same for all users. We show in Subsection 4.1 that the choice
Extracting demand information from prices
Computing the prices determined by the collusive policy requires demand observations from both firms. We explain in this section how the competitor's demand realizations can be inferred from their price path. The key feature underlying our inference process is that different demand observations by the competitor correspond to different selling prices. Using the same price algorithm as the competitor enables one to âreverse engineerâ the (private) demand data that has resulted in the (public) selling prices. By continuously applying this procedure, both users of the algorithm ensure that they base their decisions on the same data set consisting of price and demand data. As a result, from an informational viewpoint, the situation is then equivalent to the situation where demand information is publicly available.
We now explain in more detail the mechanism of demand inference from price paths that is incorporated in the ColludeâorâCompete algorithm. This mechanism utilizes that both firms use the algorithm and are synchronized, that is, that they simultaneously activate the collusive module at some time Ďâwe explain in Subsection 3.2.4 that synchronization is guaranteed per construction of the algorithm. The mechanism is based on the following idea. If firm j observes demand
At the beginning of each period
The competitor's demand during periods
This mechanism, to infer (private) demand information from (public) selling prices, hinges on two assumptions: a technical and a mathematical one. On the technical side, we need to assume that the time (say,
The second assumption underlying our demand inference scheme is that different demand observations correspond to different prices. Formally stated, the mechanism works if, with probability one,
Competitive policy
The competitive policy is based on ideas from Broadie et al. (2011). The idea is to maximize the (unknown) revenue function using a stochastic gradient ascent method introduced by Kiefer and Wolfowitz (1952), which applies price perturbations around a soâcalled pivot price to obtain a local estimate of the derivative, and updates the pivot price accordingly. Choosing appropriate perturbation sizes and step sizes between subsequent pivot prices ensures convergence of prices to the maximizer of the revenue function.
In particular, from the perspective of firm j, the competitive policy divides the time horizon into consecutive cycles, each cycle consisting of two subsequent time periods: in the nth cycle, for
Switching between modules and achieving synchronization
In this section, we explain how the ColludeâorâCompete algorithm switches between the collusive policy and the competitive policy based on the opponent's price path. We speak of the collusive/competitive âmoduleâ instead of âpolicyâ when we view it as part of the overarching ColludeâorâCompete algorithm, and not as a standâalone policy.
The algorithm always starts by running the collusive module; but, as soon as the opponent charges a price that cannot be the output of the collusive module, the algorithm immediately switches to the competitive module. The first time that the competitive module is activated, it runs
If both firms use the ColludeâorâCompete algorithm, their collusive modules need to be activated simultaneously in order for prices to converge to the collusive solution
Formally, the mechanism is as follows. Suppose that there exist
The first equality in (24) indicates activation of the collusive module by firm
In the rest of the paper, we suppose that the synchronization mechanism is implemented in full generality in a function
Comprehensive formulation of the algorithm
In the preceding sections, we have described in detail the individual components of our algorithm. In the current section, we synthesize these into a compact description of the complete algorithm. We take the perspective of firm j. In the description we write
Part A initializes the hyperâparameters of the algorithm. It also introduces three state variables: Ď keeps track of when the algorithm switches between modules, k stores how many KieferâWolfowitz cycles have been executed so far, and Îł determines how many cycles the competitive module needs to run the next time it is activated.
Part B consists of the collusive module. During the initial two periods it charges subsequent prices with a fixed ratio of
Part C consists of the competitive module. It first introduces the state variable t, which keeps track of time, and n, which determines the competitive cycle that needs to be executed next. Part C1 consists of the competitive main loop: perform the nth KieferâWolfowitz cycle, and update the state variables (i.e., increase t by 2 and n by 1). Again, if the synchronization mechanism requires it, the loop is interrupted and part B is activated. If the required number of cycles (i.e., Îł cycles) have been completed without interruption, the state variables are updated accordingly (i.e., k is increased by Îł and Îł is increased by
POLICY PERFORMANCE
In this section, we analyze the performance of the ColludeâorâCompete algorithm against an opponent that uses the same algorithm (Subsection 4.1), and against an opponent whose prices are determined by an alternative policy within a class of âreasonableâ competitive policies (Subsection 4.2). In both cases, we define regret and prove optimal regret bounds. The main results of this section, Theorems 6 and 7, can be summarized as follows. If the ColludeâorâCompete algorithm is (eventually) used by both firms, prices converge to the collusive solution
Policy performance against a collusive opponent
Throughout Subsection 4.1, we consider the case that both firms use the ColludeâorâCompete algorithm, possibly activated at different starting times. In particular, firm
Recall that the collusive policy is designed to learn the collusive price pair
Recall that Let
To prove this theorem, we apply Brouwer's invariance of domain theorem to derive conditions which imply that
Our main result in this section is the following regret bound
4
. It holds that
To prove Theorem 6, we first show for all
The parameter Îş0 captures the tradeâoff between exploration and exploitation. On the one hand, increasing Îş0 leads to more price dispersion and hence lower estimation errors, captured by the term Kleinberg et al. (2008) show that the regret rate of
Policy performance against a competitive opponent
Throughout Subsection 4.2, we consider the case that firm
In Theorem 7, we prove an
If Ď is a twice continuously differentiable reaction function that satisfies Condition (F1), then there exists a unique maximizer
which implies that
where K
0 is a given positive constant. We assume that Ď is an element of
We now show for two reasonable reaction functions that they are elements of our reactionâfunction class. Define for all Let
Furthermore, we assume that
The main result of this section is that the ColludeâorâCompete algorithm learns the asymptotically optimal price Let
To prove Theorem 7, we first show that the expected absolute revenue loss per period, compared to the asymptotically optimal revenue
Broadie et al. (2011, Proposition 1) show that
Numerical illustrations
Collusive opponent
We now provide a numerical illustration of the theoretical results from Subsection 4.1. In particular, we show by means of simulation that the prices determined by the ColludeâorâCompete algorithm converge to the collusive solution
To illustrate convergence to the collusive price pair
Regarding the price policies applied by the firms, suppose that firm one applies the policy
Figure 1a shows the price paths resulting from our simulation. The isolated (red) dots are the result of firm one switching to the collusive module (charging

Convergence of prices (left panel) and regret on a logâlog scale (right panel) when both firms use the ColludeâorâCompete algorithm
We repeat this simulation 1000 times in order to illustrate the growth rate of the regret. Figure 1b shows that the logarithm of the sample average of
Competitive opponent
We now provide a numerical illustration of the theoretical results from Subsection 4.2. In particular, we show by means of simulation that the prices determined by the ColludeâorâCompete algorithm converge to the asymptotically optimal price when playing against an exploreâthenâcommit policy, and that the growth rate of the regret is in accordance with Theorem 7.
To illustrate convergence to the asymptotically optimal price we run a simulation consisting of 1 million time periods. We again set
Figure 2a shows the price paths resulting from our simulation. It illustrates that firm two initially explores five different prices and eventually commits to the price

Convergence of prices (left panel) and regret (right panel) on a logâlog scale when one firm uses the ColludeâorâCompete algorithm and the other firm uses an exploreâthenâcommit policy
We repeat this simulation 1000 times in order to illustrate the growth rate of the regret. It turns out that
DISCUSSION
Footnotes
ACKNOWLEDGMENTS
The authors are grateful to the reviewers and editors for constructive suggestions. This research was supported by NWO Klein, Grant OCENW.KLEIN.016.
1
2
Note that firm j requires the correct value of
3
Our description of the synchronization mechanism uses the property that the collusive module of firm
. Therefore, firm j can still distinguish two subsequent phases, infer
4
Here we use the notation
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
