Abstract:Behavioral experiments on the ultimatum game (UG) reveal that we humans prefer fair acts, which contradicts the prediction made in orthodox Economics. Existing explanations, however, are mostly attributed to exogenous factors within the imitation learning framework. Here, we adopt the reinforcement learning paradigm, where individuals make their moves aiming to maximize their accumulated rewards. Specifically, we apply Q-learning to UG, where each player is assigned two Q-tables to guide decisions for the roles of proposer and responder. In a two-player scenario, fairness emerges prominently when both experiences and future rewards are appreciated. In particular, the probability of successful deals increases with higher offers, which aligns with observations in behavioral experiments. Our mechanism analysis reveals that the system undergoes two phases, eventually stabilizing into fair or rational strategies. These results are robust when the rotating role assignment is replaced by a random or fixed manner, or the scenario is extended to a latticed population. Our findings thus conclude that the endogenous factor is sufficient to explain the emergence of fairness, exogenous factors are not needed.
Machine Learning,Disordered Systems and Neural Networks,Adaptation and Self-Organizing Systems,Physics and Society,Populations and Evolution
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to explain the fair behavior exhibited by humans in the Ultimatum Game (UG). Traditional economics assumes that people are completely rational and selfish. Therefore, it predicts that in the UG, the proposer will make the lowest possible offer to maximize his or her own gain, and the responder will accept any non - zero offer. However, the actual experimental results are contrary to this prediction, indicating that humans are more inclined to fair behavior.
### Specific Problems and Methods
1. **Contradiction between Traditional Economic Predictions and Experimental Results**:
- The traditional economic assumption (Homo economicus) holds that people are completely rational and self - interested. Therefore, in the UG, the proposer will make a very low offer, and the responder will accept any non - zero offer.
- However, a large number of behavioral experiments show that most offers are concentrated between 40% and 50% of the total amount, and offers below 20% are often rejected. This indicates that humans have a strong preference for fair behavior, challenging the predictions of traditional economics.
2. **Limitations of Existing Explanations**:
- Most of the existing explanations are attributed to exogenous factors (such as social structure, role assignment, noise, etc. in the imitation - learning framework), but these explanations do not fully reveal the role of internal motivation.
3. **Research Purposes**:
- This paper adopts the Reinforcement Learning (RL) paradigm to study the emergence mechanism of fair behavior, especially by simulating the decision - making process in the UG through the Q - learning algorithm.
- The research aims to explore whether the emergence of fair behavior can be explained by endogenous factors (i.e., an individual's own learning and decision - making mechanisms) without relying on exogenous factors.
### Main Findings
- **Application of Q - learning**:
- Each player has two Q - tables, which are used for the roles of proposer and responder respectively. Players continuously update their Q - tables to optimize their strategies in order to maximize cumulative rewards.
- **Emergence of Fair Behavior**:
- When individuals value historical experience and future returns (i.e., the learning rate α is small and the discount factor γ is large), fair behavior emerges significantly. Specifically, the probability of a successful transaction increases as the offer increases, which is consistent with the observations in behavioral experiments.
- **Two - stage Evolution Mechanism**:
- First stage: Individuals adjust their strategies to increase the probability of a successful transaction. Eventually, only three strategies (pm, qm), (pl, ql) and (pm, ql) survive.
- Second stage: The system undergoes a dominant conversion pattern (pm, ql) → (pl, ql) → (pm, qm), and finally stabilizes at the fair strategy (pm, qm) or the rational strategy (pl, ql).
### Conclusion
This research shows that endogenous factors (such as an individual's learning and decision - making mechanisms) are sufficient to explain the emergence of fair behavior without relying on exogenous factors. This finding provides a new perspective for understanding human fair behavior and emphasizes the potential of reinforcement learning in explaining complex social behaviors.