Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning

Vassil Atanassov,Wanming Yu,Alexander Luis Mitchell,Mark Nicholas Finean,Ioannis Havoutis
2024-10-10
Abstract:Representation learning and unsupervised skill discovery can allow robots to acquire diverse and reusable behaviors without the need for task-specific rewards. In this work, we use unsupervised reinforcement learning to learn a latent representation by maximizing the mutual information between skills and states subject to a distance constraint. Our method improves upon prior constrained skill discovery methods by replacing the latent transition maximization with a norm-matching objective. This not only results in a much a richer state space coverage compared to baseline methods, but allows the robot to learn more stable and easily controllable locomotive behaviors. We successfully deploy the learned policy on a real ANYmal quadruped robot and demonstrate that the robot can accurately reach arbitrary points of the Cartesian state space in a zero-shot manner, using only an intrinsic skill discovery and standard regularization rewards.
Robotics
What problem does this paper attempt to address?
The paper attempts to address the problem of how to enable a quadruped robot to autonomously explore and learn diverse and reusable behavioral skills through unsupervised reinforcement learning methods, without the need for task-specific rewards. Specifically, the paper proposes an improved constrained skill discovery method aimed at overcoming the limitations of existing methods, such as excessive bias towards high-speed movements and insufficient state space coverage, thereby allowing the robot to learn more stable and controllable movement skills. ### Main Contributions: 1. **Introduction of an Unsupervised Skill Discovery Method**: Used for pre-training quadruped robots, enabling them to acquire diverse movement skills solely through skill discovery and regularization rewards. 2. **Proposed a New Skill Matching Objective**: In constrained skill discovery, allowing the robot to learn a broader range of behaviors and reliably cover a larger state space. 3. **Achieved Zero-Shot Goal Tracking**: Through conditional policies, enabling the robot to accurately perform zero-shot goal tracking in the real world without additional training. ### Method Overview: - **Skill Discovery**: Learning latent representations by maximizing the mutual information between skills and states, but unlike previous methods, this method avoids always maximizing the magnitude of latent transitions. - **Skill Matching Objective**: Introducing a new loss function that considers not only the alignment of skill directions but also the magnitude matching of latent transitions. - **Reinforcement Learning Setup**: Using intrinsic rewards and extrinsic regularization rewards to optimize policies, ensuring that the learned behaviors are both diverse and practical. ### Experimental Results: - **State Space Coverage**: Compared to baseline methods (such as LSD and METRA), this method achieves more uniform state space coverage, including both low-speed and high-speed movement behaviors. - **Zero-Shot Goal Tracking**: Validated the effectiveness of the method on a real robot, demonstrating that the robot can accurately track target positions and stop. In summary, the paper successfully addresses the key issues in learning movement skills for quadruped robots through an improved unsupervised reinforcement learning method, enhancing the robot's autonomy and adaptability.