Abstract:Ensemble reinforcement learning, which combines the decisions of a set of base agents, is proposed to enhance the decision making process and speed up training time. Many studies indicate that an ensemble model may achieve better results than a single agent because of the complement of base agents, in which the error of an agent may be corrected by others. However, the fusion method is a fundamental issue in ensemble. Currently, existing studies mainly focus on static fusion which either assumes all agents have the same ability or ignores the ones with poor average performance. This assumption causes current static fusion methods to overlook base agents with poor overall performance, but excellent results in select scenarios, which results in the ability of some agents not being fully utilized. This study aims to propose a dynamic fusion method which utilizes each base agent according to its local competence on test states. The performance of a base agent on the validation set is measured in terms of the rewards achieved by the agent in next n steps. The similarity between a validation state and a new state is quantified by Euclidian distance in the latent space and the weights of each base agent are updated according to its performance on validation states and their similarity to a new state. The experimental studies confirm that the proposed dynamic fusion method outperforms its base agents and also the static fusion methods. This is the first dynamic fusion method proposed for deep reinforcement learning, which extends the study on dynamic fusion from classification to reinforcement learning.

Deep Reinforcement Learning of the Model Fusion with Double Q-learning

Deep Reinforcement Learning with Double Q-Learning

Performing Deep Recurrent Double Q-Learning for Atari Games

Dynamic fusion for ensemble of deep Q-network

Research on Deep Reinforcement Learning Algorithm Based on Dynamic Fusion Target

Deep Reinforcement Learning: from Q-Learning to Deep Q-Learning.

Dueling Network Architecture for Multi-Agent Deep Deterministic Policy Gradient

Deep Q Net Based on Advantage Learning

Reducing overestimation in value mixing for cooperative deep multi-agent reinforcement learning

State of the Art Control of Atari Games Using Shallow Reinforcement Learning

Adaptive Double Fuzzy Systems Based Q-Learning for Pursuit-Evasion Game

Behavior Fusion for Deep Reinforcement Learning

Ensemble Network Architecture for Deep Reinforcement Learning

Based on Doubly Decoupled Reinforced Network

Deep Q-Learning with Prioritized Sampling.

Self-correcting Q-learning.

M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network

Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

Combinatorial Q-Learning for Dou Di Zhu.

Deep Q-Learning with Phased Experience Cooperation.