An Incremental Optimization Approach to Address the Spatiotemporal Reward Coupling Effects in Deep Reinforcement Learning for Path Planning

Kexin Han,Lihan Chen,Wang Zhao,Ye Zhang,Zikang Xie,Hongyu Wu
DOI: https://doi.org/10.1109/CACRE62362.2024.10635059
2024-07-18
Abstract:In the field of robotics, path planning plays an essential role. Deep Reinforcement Learning (DRL) has demon-strated significant potential in addressing the challenges of path planning in complex environments. Recently, researchers have enhanced training efficiency by introducing process rewards to provide real-time feedback to the agent, allowing for the instantaneous adjustment of policy. However, this approach leads to the Spatiotemporal coupling of rewards, adversely affecting the quality of the path and the global optimality of the policy. To address this issue, this study has developed a system that comprehensively considers both process and outcome rewards and has introduced a reward transformation policy(RPP) model. Through multi-phase training, this model gradually diminishes the impact of process rewards. This methodology enables the progressive optimization of the agent's policy, based on a baseline policy, while efficiently pursuing the optimal policy. Experimental validation conducted in dynamic-static mixed obstacle environments demonstrates that our method significantly solves the issue of spatiotemporal reward coupling and enhances path quality in path planning. Moreover, it enhances the stability and superiority of the DRL policy throughout the path planning process.
Engineering,Computer Science
What problem does this paper attempt to address?