FWA-RL: Fireworks Algorithm with Policy Gradient for Reinforcement Learning

Maiyue Chen,Ying Tan
DOI: https://doi.org/10.1109/CEC53210.2023.10254081
2023-01-01
Abstract:Evolutionary and swarm intelligence-based black-box optimization algorithms have been widely utilized to solve an increasing number of machine learning problems in recent years. Despite the effectiveness of these algorithms, the low sample efficiency has become one of the main obstacles due to their sampling-based nature. Thus, it is of great interest to improve the sample efficiency of traditional black-box optimization algorithms while keeping their merits. In this paper, the population-based fireworks algorithm (FWA) is equipped with policy gradient (PG) information, leading to the development of a novel and effective algorithm called FWA-RL. The main idea is to enhance the explosion operator with policy gradient-guided explosion and conduct firework cooperation using distillation-based cooperation. Experimental studies show the proposed algorithm can outperform state-of-the-art pure reinforcement learning (RL) algorithms and other hybrid evolutionary reinforcement learning algorithms (EARL) on the standard MuJoCo benchmark suite for continuous control. With efficient parallel implementation, FWA-RL may serve as a competitive algorithm for solving more even complex problems.
What problem does this paper attempt to address?