Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?

Connor Douglas,Foster Provost,Arun Sundararajan
2024-11-26
Abstract:Algorithmic agents are used in a variety of competitive decision settings, notably in making pricing decisions in contexts that range from online retail to residential home rentals. Business managers, algorithm designers, legal scholars, and regulators alike are all starting to consider the ramifications of "algorithmic collusion." We study the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in. Using a general-form repeated Prisoner's Dilemma game, agents engage in online learning with no prior model of game structure and no knowledge of competitors' states or actions (e.g., no observation of competing prices). We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior - what we call "naive collusion." We primarily study this system through an analytical model and examine perturbations to the model through simulations. Our findings have several notable implications for regulators. First, calls to limit algorithms from conditioning on competitors' prices are insufficient to prevent algorithmic collusion. This is a direct result of collusion arising even in the naive setting. Second, symmetry in algorithms can increase collusion potential. This highlights a new, simple mechanism for "hub-and-spoke" algorithmic collusion. A central distributor need not imbue its algorithm with supra-competitive tendencies for apparent collusion to arise; it can simply arise by using certain (common) machine learning algorithms. Finally, we highlight that collusive outcomes depend starkly on the specific algorithm being used, and we highlight market and algorithmic conditions under which it will be unknown a priori whether collusion occurs.
General Economics,Artificial Intelligence,Computer Science and Game Theory,Multiagent Systems
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: in the complete absence of information about the strategic interactions in which they are involved, whether competitive multi - armed bandit algorithms will converge to collusive behavior, namely the so - called "naive collusion." Specifically, the author has studied whether these algorithms will still learn cooperative rather than competitive behavior without knowledge of opponents' choices or results. ### Research Background and Problem With the wide application of autonomous pricing algorithms, from product pricing in the Amazon market to the determination of residential real - estate rents, these algorithms show possible learning - to - collude behavior. This algorithmic collusion has attracted the attention of global regulatory agencies, because current antitrust laws usually require evidence of intentional coordination or "exchange of wills." However, in algorithmic collusion, this coordination may be naturally generated by independent behavior optimization rather than through explicit communication. ### Main Research Content The paper studies this problem through a classic game - theory model - the Prisoner's Dilemma. In this model, agents use a set of standard multi - armed bandit learning algorithms for online learning, and these agents have no prior knowledge of the game structure, the state or actions of competitors. Therefore, these algorithms learn only based on their own action and reward history. ### Main Findings 1. **Collusion of Deterministic Algorithms**: When two symmetric agents use deterministic multi - armed bandit learning algorithms, they will almost always learn to collude. 2. **Competition of Non - Deterministic Algorithms**: When two agents use a specific type of widely - studied non - deterministic multi - armed bandit learning algorithm (such as the epsilon - greedy algorithm without decay), they will not learn to collude in the long run. 3. **The Influence of Symmetry**: The symmetry of algorithms can increase the possibility of collusion, which provides a simple mechanism for "hub - and - spoke" collusion. 4. **The Influence of Market Conditions**: The result of collusion depends on specific algorithms and market conditions, so in some cases, it is impossible to determine in advance whether collusion will occur. ### Experiments and Analysis The author has verified these findings through analyzing the model and simulation experiments. For example, for the deterministic UCB algorithm, even introducing a small amount of asymmetry or randomness cannot prevent the continuous occurrence of naive algorithmic collusion. While for the epsilon - greedy algorithm, it always shows competitive behavior. ### Conclusions and Implications These results have important implications for regulatory agencies. First, policies that limit algorithms to adjust their own prices according to competitors' prices are not sufficient to prevent algorithmic collusion. Second, even simple and common machine - learning algorithms may lead to collusive behavior. Finally, the author emphasizes the need to deeply understand the specific implementation methods of different algorithms in order to predict their behavior in the market. ### Formula Representation The formulas involved in the paper, such as the calculation of value estimation, are represented in Markdown format as follows: \[ v(a, H_t) := \begin{cases} \frac{\alpha_a\cdot\rho}{\alpha_a\cdot\vec{1}} & \text{if } \alpha_a\cdot\vec{1}\neq 0 \\ 0 & \text{otherwise} \end{cases} \] where: - \( \alpha_a \) is the binary vector of action \( a \). - \( \rho \) is the reward vector. - \( \vec{1} \) is the all - ones vector. These formulas ensure the correctness and readability of the formulas and help readers better understand the research content.