Overcoming Delayed Feedback in Reinforcement Learning Using Actor Ensembles

Jongsoo Lee,Jonghyeok Park,Soohee Han
DOI: https://doi.org/10.1007/s12555-024-0043-9
IF: 2.964
2024-11-08
International Journal of Control Automation and Systems
Abstract:Reinforcement learning (RL) has led to remarkable advances in several fields. However, delayed feedback can violate its fundamental assumption, the Markovian property, potentially causing significant problems. To address the challenges of delayed feedback in RL, this study introduces the actor-ensemble twin-delayed deep deterministic policy gradient (AE-TD3), designed to effectively use state and action experiences in the prior while mitigating performance deterioration due to the state-space explosion often encountered with traditional augmentation-based RL methods. Our proposed augmentation-based RL, termed AE-TD3, establishes an actor-ensemble network and concurrently selects one of the multiple actions from the network to explore and represent the expanded state space stably and efficiently. Experimental results showed that AE-TD3 achieved higher expected returns or improved learning stability compared with the traditional augmentation-based RL method in various continuous control tasks using MuJoCo. We believe that the proposed AE-TD3 has the potential to overcome problems associated with delayed feedback.
automation & control systems
What problem does this paper attempt to address?