Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning

Liu Qiyuan
2024-05-02
Abstract:The existing Motion Imitation models typically require expert data obtained through MoCap devices, but the vast amount of training data needed is difficult to acquire, necessitating substantial investments of financial resources, manpower, and time. This project combines 3D human pose estimation with reinforcement learning, proposing a novel model that simplifies Motion Imitation into a prediction problem of joint angle values in reinforcement learning. This significantly reduces the reliance on vast amounts of training data, enabling the agent to learn an imitation policy from just a few seconds of video and exhibit strong generalization capabilities. It can quickly apply the learned policy to imitate human arm motions in unfamiliar videos. The model first extracts skeletal motions of human arms from a given video using 3D human pose estimation. These extracted arm motions are then morphologically retargeted onto a robotic manipulator. Subsequently, the retargeted motions are used to generate reference motions. Finally, these reference motions are used to formulate a reinforcement learning problem, enabling the agent to learn a policy for imitating human arm motions. This project excels at imitation tasks and demonstrates robust transferability, accurately imitating human arm motions from other unfamiliar videos. This project provides a lightweight, convenient, efficient, and accurate Motion Imitation model. While simplifying the complex process of Motion Imitation, it achieves notably outstanding performance.
Robotics,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to simplify the motion imitation process of robots, enabling them to efficiently use a small amount of easily - obtainable data to imitate the motion of human arms. Specifically, existing motion imitation models usually need to obtain expert data through motion capture (MoCap) devices, but a large amount of training data is difficult to obtain, which requires a large amount of capital, manpower and time investment. To solve this problem, this paper proposes a new model that combines 3D human pose estimation and reinforcement learning, transforming the complex motion imitation problem into the prediction problem of joint angle values. This not only significantly reduces the dependence on a large amount of training data, but also enables the robot to quickly learn the imitation strategy from a few - second - long videos and shows strong generalization ability. ### Main Contributions 1. **Reducing Data Dependence**: Through 3D human pose estimation technology, this model can extract the actions of human arms from a small number of videos, greatly reducing the need for large - scale expert data. 2. **Efficient Motion Imitation**: The model can learn to imitate the motion of human arms in a short time and can apply the learned strategy to unfamiliar videos. 3. **Light - weight and Efficient**: The whole process is lightweight, convenient and efficient while maintaining high precision. ### Method Overview 1. **Original Arm Motion Extraction**: Use 3D human pose estimation technology to extract the original motion of human arms from the input video. Specific steps include: - Use YOLOv3 and HRNet detection models to attach key points on each frame and generate the 2D coordinates of these key points. - Use the Strided Transformer Network to promote the 2D key point coordinates to 3D space and generate the final 3D data of human motion. 2. **Motion Relocalization**: Relocate the extracted arm motion to the form of the robot manipulator. This step includes geometric transformation and inverse kinematics to ensure that the robot manipulator can perform similar human arm motions. 3. **Motion Imitation**: Use the relocalized reference motion to construct a reinforcement learning problem and train the control strategy to imitate the reference motion. Specific steps include: - Take the relocalized reference motion as the target and design a specific reward function to encourage the strategy to be as close as possible to the target motion. - Use the Proximal Policy Optimization (PPO) algorithm to train the control strategy so that the robot manipulator can effectively imitate the motion of human arms. ### Experimental Results The experimental results show that this model performs excellently in the imitation task, can accurately imitate the motion of human arms, and also shows strong generalization ability in unfamiliar videos. In addition, the model shows good stability and efficiency during the training process. ### Conclusion This paper proposes a new motion imitation model that combines 3D human pose estimation and deep reinforcement learning, successfully solving the problem of the existing models' dependence on a large amount of expert data, and providing a lightweight, efficient and accurate motion imitation method. This achievement not only has important application value in robot control and grasping tasks, but also provides new ideas for future humanoid robot research.