Abstract:Representation learning and unsupervised skill discovery can allow robots to acquire diverse and reusable behaviors without the need for task-specific rewards. In this work, we use unsupervised reinforcement learning to learn a latent representation by maximizing the mutual information between skills and states subject to a distance constraint. Our method improves upon prior constrained skill discovery methods by replacing the latent transition maximization with a norm-matching objective. This not only results in a much a richer state space coverage compared to baseline methods, but allows the robot to learn more stable and easily controllable locomotive behaviors. We successfully deploy the learned policy on a real ANYmal quadruped robot and demonstrate that the robot can accurately reach arbitrary points of the Cartesian state space in a zero-shot manner, using only an intrinsic skill discovery and standard regularization rewards.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to enable a quadruped robot to autonomously explore and learn diverse and reusable behavioral skills through unsupervised reinforcement learning methods, without the need for task-specific rewards. Specifically, the paper proposes an improved constrained skill discovery method aimed at overcoming the limitations of existing methods, such as excessive bias towards high-speed movements and insufficient state space coverage, thereby allowing the robot to learn more stable and controllable movement skills. ### Main Contributions: 1. **Introduction of an Unsupervised Skill Discovery Method**: Used for pre-training quadruped robots, enabling them to acquire diverse movement skills solely through skill discovery and regularization rewards. 2. **Proposed a New Skill Matching Objective**: In constrained skill discovery, allowing the robot to learn a broader range of behaviors and reliably cover a larger state space. 3. **Achieved Zero-Shot Goal Tracking**: Through conditional policies, enabling the robot to accurately perform zero-shot goal tracking in the real world without additional training. ### Method Overview: - **Skill Discovery**: Learning latent representations by maximizing the mutual information between skills and states, but unlike previous methods, this method avoids always maximizing the magnitude of latent transitions. - **Skill Matching Objective**: Introducing a new loss function that considers not only the alignment of skill directions but also the magnitude matching of latent transitions. - **Reinforcement Learning Setup**: Using intrinsic rewards and extrinsic regularization rewards to optimize policies, ensuring that the learned behaviors are both diverse and practical. ### Experimental Results: - **State Space Coverage**: Compared to baseline methods (such as LSD and METRA), this method achieves more uniform state space coverage, including both low-speed and high-speed movement behaviors. - **Zero-Shot Goal Tracking**: Validated the effectiveness of the method on a real robot, demonstrating that the robot can accurately track target positions and stop. In summary, the paper successfully addresses the key issues in learning movement skills for quadruped robots through an improved unsupervised reinforcement learning method, enhancing the robot's autonomy and adaptability.

Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning

Unsupervised Discovery of Transitional Skills for Deep Reinforcement Learning

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Lipschitz-constrained Unsupervised Skill Discovery

SLR: Learning Quadruped Locomotion without Privileged Information

Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Controllability-Aware Unsupervised Skill Discovery

Unsupervised Skill Discovery via Recurrent Skill Training

Learning Semantics-Aware Locomotion Skills from Human Demonstration

Learning Agile Locomotion on Risky Terrains

An Efficient Model-Based Approach on Learning Agile Motor Skills without Reinforcement

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Learning Multiple Gaits within Latent Space for Quadruped Robots

A learning-based control pipeline for generic motor skills for quadruped robots

Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

MOVE: Multi-skill Omnidirectional Legged Locomotion with Limited View in 3D Environments

Learning to walk in confined spaces using 3D representation

Experience-Learning Inspired Two-Step Reward Method for Efficient Legged Locomotion Learning Towards Natural and Robust Gaits

Learning Whole-body Motor Skills for Humanoids

Lifelike Agility and Play in Quadrupedal Robots using Reinforcement Learning and Generative Pre-trained Models