Reinforcement Learning from Wild Animal Videos

Elliot Chane-Sane,Constant Roux,Olivier Stasse,Nicolas Mansard
2024-12-05
Abstract:We propose to learn legged robot locomotion skills by watching thousands of wild animal videos from the internet, such as those featured in nature documentaries. Indeed, such videos offer a rich and diverse collection of plausible motion examples, which could inform how robots should move. To achieve this, we introduce Reinforcement Learning from Wild Animal Videos (RLWAV), a method to ground these motions into physical robots. We first train a video classifier on a large-scale animal video dataset to recognize actions from RGB clips of animals in their natural habitats. We then train a multi-skill policy to control a robot in a physics simulator, using the classification score of a third-person camera capturing videos of the robot's movements as a reward for reinforcement learning. Finally, we directly transfer the learned policy to a real quadruped Solo. Remarkably, despite the extreme gap in both domain and embodiment between animals in the wild and robots, our approach enables the policy to learn diverse skills such as walking, jumping, and keeping still, without relying on reference trajectories nor skill-specific rewards.
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How can quadruped robots learn and master diverse motion skills (such as remaining stationary, walking, jumping, etc.) by watching a large number of wildlife videos without relying on reference trajectories or reward functions specifically designed for each skill? Specifically, the authors propose a method named "Reinforcement Learning from Wild Animal Videos (RLWAV)", aiming to transfer the action concepts learned from wildlife videos to physical robots through reinforcement learning. ### Main Problem Decomposition 1. **Learning across Morphological Differences**: - There are significant morphological differences between wild animals and robots, which makes it difficult to directly learn actions from animal videos. For example, the movement patterns of animals and the physical structures of robots are different, so these differences need to be considered when transferring actions. 2. **Physical Realization of Visual Imitation**: - How to transform the visual information extracted from videos into actual robot behaviors and ensure that these behaviors are physically feasible. This involves combining the action concepts in the videos with the physical form of the robot to generate reasonable control strategies. 3. **No Need for Manual Design of Reward Functions**: - Traditional methods usually need to design specific reward functions for each skill, which is both time - consuming and complex. This paper proposes a method for automatically learning reward functions, using video classifiers to evaluate the performance of robots in performing tasks, thus simplifying this process. ### Method Overview - **Video Classifier Training**: - Use a large - scale wildlife video dataset (such as Animal Kingdom) to train a video classifier so that it can recognize different action categories (such as remaining stationary, walking, running, jumping). This classifier will be used to guide the robot's learning process. - **Multi - skill Policy Training**: - In a physical simulation environment, use reinforcement learning to train a multi - skill policy so that the robot can optimize its behavior according to the feedback from the video classifier. The specific reward function is based on the scores of the robot's actions by the video classifier. - **Physical Constraints**: - In order to ensure the physical rationality of the robot's behavior, a series of physical constraints (such as joint angle limits, torque limits, etc.) are imposed to ensure that the learned skills can be effectively executed in the real world. ### Experimental Verification - **Simulation Experiments**: - The effectiveness of this method was verified in a simulation environment, showing that the robot can successfully learn multiple motion skills and these skills are visually consistent with expectations. - **Real - Robot Experiments**: - The learned strategy was deployed on a real Solo - 12 quadruped robot to verify its performance in an outdoor environment. The results show that the robot can perform multiple skills such as remaining stationary, walking, running, and jumping. In general, this paper solves the difficult problem of learning robot motion skills from wildlife videos through an innovative method, demonstrating the possibility of visual imitation and physical realization across morphological differences.