Macheng Shen,Jonathan P. How
Abstract:This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to learn robust strategies in Asymmetric Imperfect - Information Games (AIIG). This type of game is a special kind of Bayesian game, in which the information among agents (players) is asymmetric, that is, the information possessed by one side is unknown to the other side. Specifically, the paper focuses on how to learn robust strategies against different opponent types through Multi - Agent Reinforcement Learning (MARL) when an agent (called the protagonist agent) does not know the opponent's type. The opponent type refers to some private information of the opponent, which is only known to the opponent itself and is unknown to the protagonist agent.
### Main Challenges
1. **Opponent Modeling**:
- In AIIG, the reward of the protagonist agent depends on the opponent's type, which is the opponent's private information and only known to the opponent itself. Therefore, in order to make optimal decisions, the protagonist agent must infer the opponent's type through the opponent's behavior.
- The paper proposes to use multi - agent reinforcement learning to learn the opponent model through self - play, which can capture the complete strategic interaction and reasoning process among agents.
2. **Ensemble Training**:
- In order to make the learned protagonist agent strategy be able to counter different opponent types, the paper introduces the ensemble training method. By training multiple strategies (forming an ensemble), the protagonist agent's strategy can be made more robust.
- However, ensemble training will significantly increase the computational complexity, so it is necessary to find a reasonable ensemble size to strike a balance between robustness and computational complexity.
3. **Meta - Optimization**:
- In order to optimize the effect of ensemble training, the paper proposes a meta - optimization framework. This framework optimizes the robustness index by dynamically adjusting the number of strategies in the ensemble.
- Specifically, the paper uses the Simulated Annealing algorithm to dynamically optimize the opponent strategy ensemble to achieve the best balance between robustness and computational complexity.
### Solutions
1. **Opponent Modeling**:
- Use multi - agent reinforcement learning to learn the opponent model through self - play, and these models can capture the opponent's behavior patterns.
- By explicitly updating the belief state, the protagonist agent can better infer the opponent's type.
2. **Ensemble Training**:
- By training multiple opponent strategies (forming an ensemble), make the protagonist agent's strategy be able to deal with many different opponent types.
- Use the policy distillation technique to synthesize multiple strategies into a representative strategy to reduce the computational complexity.
3. **Meta - Optimization**:
- Propose a meta - optimization framework, optimize the robustness index by dynamically adjusting the number of strategies in the ensemble.
- Use the Simulated Annealing algorithm to dynamically optimize the opponent strategy ensemble to achieve the best balance between robustness and computational complexity.
### Experimental Results
The experimental results show that, compared with the baseline method, the proposed meta - optimization framework significantly improves the robustness of the protagonist agent's strategy and performs better under the same computational budget.
### Summary
The main contribution of this paper is to propose a new multi - agent reinforcement learning framework for learning robust strategies in asymmetric imperfect - information games. Through opponent modeling, ensemble training and meta - optimization, this framework can improve the robustness of the strategy while maintaining computational efficiency.