Control of Space Flexible Manipulator Using Soft Actor-Critic and Random Network Distillation

Chen Yang,Jun Yang,Xueqian Wang,Bin Liang
DOI: https://doi.org/10.1109/robio49542.2019.8961852
2019-01-01
Abstract:Due to the excessive degree of freedom of the space flexible manipulator, we can hardly obtain its accurate dynamic model for its motion planning. In this work, we formulate the precise motion control of the free-floating space piecewise constant curvature (FSPCC) continuum manipulator (i.e. space flexible manipulator) as a sparse reward problem in reinforcement learning, and use the Soft Actor-Critic (SAC) algorithm along with the Random Network Distillation (RND) method to train the optimal policies. Firstly, we use the RND method to jointly train a predictive network and a fixed network. The discrepancy between the output values of the two networks is served as an internal reward for the environment. Secondly, the SAC algorithm aims to maximize the expected return and the entropy of the policy. Policies with high entropy will successfully complete the task while acting as randomly as possible. Finally, the internal rewards tend to incentivize the agent to explore more widely for faster convergence of the algorithm. We applied this method to the FSPCC continuum manipulator simulation model and the results demonstrate that the SAC algorithm together with RND method can control the FSPCC continuum manipulator to catch the target quickly, even in the presence of sparse reward.
What problem does this paper attempt to address?