Abstract:With dynamic pricing on the rise, firms are using sophisticated algorithms for price determination. These algorithms are often non-interpretable and there has been a recent interest in their seemingly emergent ability to tacitly collude with each other without any prior communication whatsoever. Most of the previous works investigate algorithmic collusion on simple reinforcement learning (RL) based algorithms operating on a basic market model. Instead, we explore the collusive tendencies of Proximal Policy Optimization (PPO), a state-of-the-art continuous state/action space RL algorithm, on a complex double-sided hierarchical market model of rideshare. For this purpose, we extend a mathematical program network (MPN) based rideshare model to a temporal multi origin-destination setting and use PPO to solve for a repeated duopoly game. Our results indicate that PPO can either converge to a competitive or a collusive equilibrium depending upon the underlying market characteristics, even when the hyper-parameters are held constant.
What problem does this paper attempt to address?
The paper attempts to address the issue of whether complex algorithms (especially reinforcement learning algorithms) in two-sided markets (such as ride-hailing markets) will exhibit tacit collusion behavior in the context of the gradual popularization of dynamic pricing. Specifically, the researchers focus on whether modern reinforcement learning algorithms (such as Proximal Policy Optimization, PPO) will converge to competitive equilibrium or collusive equilibrium in such complex market environments and explore how this behavior is affected by market response speed.
### Background and Motivation
- **The Rise of Dynamic Pricing**: With the proliferation of smartphones and the internet, dynamic pricing technology has been widely applied in many industries, such as gasoline, ticketing, online retail, etc. The ride-hailing industry uses dynamic pricing to increase prices during peak demand periods to ensure the availability of high-priority requests.
- **Black Box Nature of Algorithms**: Most complex machine learning models are black-box models, making it difficult to explain their internal mechanisms. This has raised concerns about the potential for tacit collusion by these algorithms.
- **Collusion Phenomenon**: Collusion refers to the behavior of rational agents cooperating to obtain profits above competitive levels. Although most economies severely punish coordinated collusion between firms, tacit collusion in algorithms still exists, and its causes remain unclear.
### Research Objectives
- **Explore Collusion Tendencies of Modern Algorithms in Complex Markets**: The researchers chose Proximal Policy Optimization (PPO), a state-of-the-art continuous state/action space reinforcement learning algorithm, to experiment in a complex two-sided market model.
- **Compare Collusion Behavior Under Different Market Response Speeds**: The researchers explored the impact of market response speed (i.e., the speed at which the market adjusts to price changes) on collusion behavior.
### Methods
- **Model Extension**: The researchers extended the Mathematical Programming Network (MPN) model to a dynamic multi-origin-destination setting to simulate the complexity of the ride-hailing market.
- **Experimental Design**: The PPO algorithm was used to solve repeated duopoly game problems, and collusion behavior was analyzed by comparing results under different market response speeds.
### Main Contributions
- **Using Modern Reinforcement Learning Algorithms to Study Potential Collusion Behavior in Non-Trivial Markets**.
- **Comparing Collusion Behavior Under Different Market Response Speeds**.
### Conclusion
- **Research Findings**: The PPO algorithm can converge to competitive equilibrium or collusive equilibrium under different market conditions, even with constant hyperparameters.
- **Impact of Market Response Speed**: When the market response speed is fast, competitive pressure reduces platform profits; whereas in slow market response conditions, the algorithm learns to extract consistent profits through collusion.
Through this study, the authors provide new insights into the collusion behavior of modern algorithms in complex markets and offer important references for regulators and enterprises.