Abstract:We propose to learn legged robot locomotion skills by watching thousands of wild animal videos from the internet, such as those featured in nature documentaries. Indeed, such videos offer a rich and diverse collection of plausible motion examples, which could inform how robots should move. To achieve this, we introduce Reinforcement Learning from Wild Animal Videos (RLWAV), a method to ground these motions into physical robots. We first train a video classifier on a large-scale animal video dataset to recognize actions from RGB clips of animals in their natural habitats. We then train a multi-skill policy to control a robot in a physics simulator, using the classification score of a third-person camera capturing videos of the robot's movements as a reward for reinforcement learning. Finally, we directly transfer the learned policy to a real quadruped Solo. Remarkably, despite the extreme gap in both domain and embodiment between animals in the wild and robots, our approach enables the policy to learn diverse skills such as walking, jumping, and keeping still, without relying on reference trajectories nor skill-specific rewards.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How can quadruped robots learn and master diverse motion skills (such as remaining stationary, walking, jumping, etc.) by watching a large number of wildlife videos without relying on reference trajectories or reward functions specifically designed for each skill? Specifically, the authors propose a method named "Reinforcement Learning from Wild Animal Videos (RLWAV)", aiming to transfer the action concepts learned from wildlife videos to physical robots through reinforcement learning. ### Main Problem Decomposition 1. **Learning across Morphological Differences**: - There are significant morphological differences between wild animals and robots, which makes it difficult to directly learn actions from animal videos. For example, the movement patterns of animals and the physical structures of robots are different, so these differences need to be considered when transferring actions. 2. **Physical Realization of Visual Imitation**: - How to transform the visual information extracted from videos into actual robot behaviors and ensure that these behaviors are physically feasible. This involves combining the action concepts in the videos with the physical form of the robot to generate reasonable control strategies. 3. **No Need for Manual Design of Reward Functions**: - Traditional methods usually need to design specific reward functions for each skill, which is both time - consuming and complex. This paper proposes a method for automatically learning reward functions, using video classifiers to evaluate the performance of robots in performing tasks, thus simplifying this process. ### Method Overview - **Video Classifier Training**: - Use a large - scale wildlife video dataset (such as Animal Kingdom) to train a video classifier so that it can recognize different action categories (such as remaining stationary, walking, running, jumping). This classifier will be used to guide the robot's learning process. - **Multi - skill Policy Training**: - In a physical simulation environment, use reinforcement learning to train a multi - skill policy so that the robot can optimize its behavior according to the feedback from the video classifier. The specific reward function is based on the scores of the robot's actions by the video classifier. - **Physical Constraints**: - In order to ensure the physical rationality of the robot's behavior, a series of physical constraints (such as joint angle limits, torque limits, etc.) are imposed to ensure that the learned skills can be effectively executed in the real world. ### Experimental Verification - **Simulation Experiments**: - The effectiveness of this method was verified in a simulation environment, showing that the robot can successfully learn multiple motion skills and these skills are visually consistent with expectations. - **Real - Robot Experiments**: - The learned strategy was deployed on a real Solo - 12 quadruped robot to verify its performance in an outdoor environment. The results show that the robot can perform multiple skills such as remaining stationary, walking, running, and jumping. In general, this paper solves the difficult problem of learning robot motion skills from wildlife videos through an innovative method, demonstrating the possibility of visual imitation and physical realization across morphological differences.

Reinforcement Learning from Wild Animal Videos

Learning Robust, Agile, Natural Legged Locomotion Skills in the Wild

Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior

Learning Agile Locomotion on Risky Terrains

DeepWalk: Omnidirectional Bipedal Gait by Deep Reinforcement Learning

SLR: Learning Quadruped Locomotion without Privileged Information

Learning and Adapting Agile Locomotion Skills by Transferring Experience

SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

Lifelike Agility and Play in Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models

Learning Bipedal Walking On Planned Footsteps For Humanoid Robots

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Perception-Driven Learning of High-Dynamic Jumping Motions for Single-Legged Robots

Dynamic Bipedal Maneuvers through Sim-to-Real Reinforcement Learning

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Advanced Skills through Multiple Adversarial Motion Priors in Reinforcement Learning

Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments

Imitation and Adaptation Based on Consistency: A Quadruped Robot Imitates Animals from Videos Using Deep Reinforcement Learning

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Learning and Reusing Quadruped Robot Movement Skills from Biological Dogs for Higher-Level Tasks