SAR-PPO(Segmented Adaptive Reward): Robotic Arm Open Door Motion Control With Reinforcement Learning Based on Segmented Adaptive Reward

Daoxiong Gong,Jianjun Yu,Xinyue Feng,Yunai Gong
DOI: https://doi.org/10.23919/CCC63176.2024.10661640
2024-07-28
Abstract:Door opening, as one of the common actions in daily life, has become an important direction for robotic arm applications. Different door handles open in different ways, to enable the robotic arm to complete the corresponding door-opening operation according to the handle category, the Proximal Policy Optimization algorithm is used to open the door. Opening the door contains a multi-segment process such as approaching the handle, operating the handle and pushing the door open. The sparse reward that focuses only on the result of opening the door will lead to the extension of the training time of the robotic arm, or even fail to converge. To address this problem, this paper proposes a segmented adaptive reward. First, consider the segment task of opening the door, design the segmented reward, formulate segmented training rules, and gradually guide the robotic arm to improve the overall training effect. At the same time, the reward adds an adaptive weight adjustment mechanism, which adaptively adjusts the weights according to the current stage of attention to different tasks, and then matches the segmented training to accelerate the training speed. In a simulation environment, the experimental results show that the door opening success rate of our algorithm is $61.04 \%$ higher than that of the original PPO algorithm, and it can achieve the round handle opening task that cannot be solved by the original algorithm.
Engineering,Computer Science
What problem does this paper attempt to address?