Abstract:Ensemble reinforcement learning, which combines the decisions of a set of base agents, is proposed to enhance the decision making process and speed up training time. Many studies indicate that an ensemble model may achieve better results than a single agent because of the complement of base agents, in which the error of an agent may be corrected by others. However, the fusion method is a fundamental issue in ensemble. Currently, existing studies mainly focus on static fusion which either assumes all agents have the same ability or ignores the ones with poor average performance. This assumption causes current static fusion methods to overlook base agents with poor overall performance, but excellent results in select scenarios, which results in the ability of some agents not being fully utilized. This study aims to propose a dynamic fusion method which utilizes each base agent according to its local competence on test states. The performance of a base agent on the validation set is measured in terms of the rewards achieved by the agent in next n steps. The similarity between a validation state and a new state is quantified by Euclidian distance in the latent space and the weights of each base agent are updated according to its performance on validation states and their similarity to a new state. The experimental studies confirm that the proposed dynamic fusion method outperforms its base agents and also the static fusion methods. This is the first dynamic fusion method proposed for deep reinforcement learning, which extends the study on dynamic fusion from classification to reinforcement learning.

Ensemble Network Architecture for Deep Reinforcement Learning

Historical Best Q-Networks for Deep Reinforcement Learning.

Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer

Shared Learning : Enhancing Reinforcement in $Q$-Ensembles

Dynamic fusion for ensemble of deep Q-network

Deep Q Net Based on Advantage Learning

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Deep Reinforcement Learning of the Model Fusion with Double Q-learning

The Actor-Dueling-Critic Method for Reinforcement Learning.

Deep Reinforcement Learning with Double Q-Learning

Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

Dynamic sparse coding-based value estimation network for deep reinforcement learning

Data Efficient Deep Reinforcement Learning with Action-Ranked Temporal Difference Learning

Uncertainty-Aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

Handling Large-Scale Action Space In Deep Q Network

Windows Deep Transformer Q-networks: an Extended Variance Reduction Architecture for Partially Observable Reinforcement Learning

Based on Doubly Decoupled Reinforced Network

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control

Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning

REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes

Ensemble Bootstrapping for Q-Learning