Phasic Parallel-Network Policy: a Deep Reinforcement Learning Framework Based on Action Correlation

Jiahao Li,Tianhan Gao,Qingwei Mi
DOI: https://doi.org/10.1007/s00607-024-01329-3
2024-01-01
Computing
Abstract:Reinforcement learning algorithms show significant variations in performance across different environments. Optimization for reinforcement learning thus becomes the major research task since the instability and unpredictability of the reinforcement learning algorithms have consistently hindered their generalization capabilities. In this study, we address this issue by optimizing the algorithm itself rather than environment-specific optimizations. We start by tackling the uncertainty caused by the mutual influence of original action interferences, aiming to enhance the overall performance. The Phasic Parallel-Network Policy (PPP), which is a deep reinforcement learning framework. It diverges from the traditional policy actor-critic method by grouping the action space based on action correlations. The PPP incorporates parallel network structures and combines network optimization strategies. With the assistance of the value network, the training process is divided into different specific stages, namely the Extra-group Policy Phase and the Inter-group Optimization Phase. PPP breaks through the traditional unit learning structure. The experimental results indicate that it not only optimizes training effectiveness but also reduces training steps, enhances sample efficiency, and significantly improves stability and generalization.
What problem does this paper attempt to address?