Abstract:Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the "competition vs. innovation" relationship, a well-studied theme in economics.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is the interaction between exploration and competition. Specifically: 1. **How to balance exploration and competition**: In modern systems, in order to learn from user interactions, systems often need to explore, that is, make potentially sub - optimal choices to obtain new information. However, this exploratory behavior may reduce the current service quality and lead to user churn. Therefore, when multiple systems compete for users simultaneously, how do they find a balance between this exploration and competition? 2. **Does competition encourage the adoption of better exploration algorithms?**: The author has studied whether and to what extent competition can encourage each system to adopt more effective exploration algorithms. Specific questions include: - Does better learning technology always bring higher utility? - Will better exploration algorithms be used in the equilibrium of the "competition game"? - Does competition improve social welfare more than monopoly? 3. **The role of users**: Users play three different roles in this process: - As customers, they bring revenue to the system; - As a data source, they provide the information needed for the system to learn; - As self - interested agents, they choose which system to use. ### Model overview The author constructs a multi - armed bandit model, in which two systems (principals) explore and compete for users simultaneously. Users arrive one by one and make a choice between the two systems. Each system can only make progress when it is selected. By changing the rationality of users and the intensity of competition, the author explores the strategic choices of systems and their results in different situations. ### Main findings 1. **HardMax model**: In this extremely rational model, the optimal strategy for each system is not to explore at all, but always choose the action with the highest expected return (DynamicGreedy). Although this can maximize short - term gains, in the long run it may lead to the inability to discover better actions. 2. **HardMax&Random model**: After introducing a certain proportion of random selection, better algorithms have a chance to win. If an algorithm is good enough, it can win all non - random users after the initial learning stage. 3. **SoftMax model**: Further relaxing the rationality assumption, the probability of user selection depends smoothly on the difference in the expected returns of the two systems. In this case, better algorithms are more likely to win, although the competition is more relaxed and the two systems usually each attract about half of the users. ### Economic interpretation The author analogizes these findings to the relationship between "competition and innovation" in economics, and points out that there is an inverted - U relationship between the intensity of competition and the rationality of the system. Moderate competition can encourage innovation, but excessive competition may instead inhibit innovation. ### Summary This paper studies how systems balance exploration and competition in a competitive environment and explores the impact of competition on the adoption of better exploration algorithms by constructing a multi - armed bandit model. The research results show that moderate competition can promote the adoption of better algorithms, but excessive or lack of competition may be unfavorable to innovation.

Competing Bandits: Learning under Competition

Competing Bandits: The Perils of Exploration Under Competition

The Perils of Exploration under Competition: A Computational Modeling Approach

Multiplayer Bandit Learning, from Competition to Cooperation

Bayesian Incentive-Compatible Bandit Exploration

Advancements in Dueling Bandits

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Social Bandit Learning: Strangers Can Help

Competing Bandits in Non-Stationary Matching Markets

Preference-based Online Learning with Dueling Bandits: A Survey

A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion

Bandit Learning in Matching Markets: Utilitarian and Rawlsian Perspectives

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

Online Learning for Auction Mechanism in Bandit Setting

A Gang of Bandits

An Opportunistic Bandit Approach for User Interface Experimentation

Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?

Competing Bandits in Decentralized Large Contextual Matching Markets

Combinatorial Rising Bandit