Abstract:<p>In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantage of representations of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and classifiers evolution. Due to the condition representation and the matching mechanism of XCS, the quantitative heuristics and the opponent model can provide guidance for action selection in states with similar feature representations. Besides, in case of multiple heuristic policies, we introduce the concept of Pareto optimality to comprehensively consider all the heuristic policies for action selection. Moreover, we analyze the influence of heuristic policies on the convergence of the algorithm. Furthermore, we introduce accuracy-based eligibility traces to further speed up classifiers evolution, i.e., in the reinforcement part, classifiers that can match historical traces are reinforced according to their accuracy. We demonstrate the advantages of the proposed algorithm over several benchmark algorithms in a soccer scenario and a thief-and-hunter scenario.</p>

Mastering construction heuristics with self-play deep reinforcement learning

Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework

Self-play Reinforcement Learning with Comprehensive Critic in Computer Games

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Hierarchical Deep Reinforcement Learning Agent with Counter Self-play on Competitive Games

Mastering Atari, Go, chess and shogi by planning with a learned model

Efficient Competitive Self-Play Policy Optimization

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

MetroZero: Deep Reinforcement Learning and Monte Carlo Tree Search for Optimized Metro Network Expansion

Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Attention, Learn to Solve Routing Problems!

Reinforcement Learning Driven Heuristic Optimization

Neural Auto-Curricula in Two-Player Zero-Sum Games.

Efficient use of heuristics for accelerating XCS-based policy learning in Markov games

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

Scalable Online Planning via Reinforcement Learning Fine-Tuning

Warm-Start AlphaZero Self-Play Search Enhancements

Neural Auto-Curricula