Abstract:Integrating robots in complex everyday environments requires a multitude of problems to be solved. One crucial feature among those is to equip robots with a mechanism for teaching them a new task in an easy and natural way. When teaching tasks that involve sequences of different skills, with varying order and number of these skills, it is desirable to only demonstrate full task executions instead of all individual skills. For this purpose, we propose a novel approach that simultaneously learns to segment trajectories into reoccurring patterns and the skills to reconstruct these patterns from unlabelled demonstrations without further supervision. Moreover, the approach learns a skill conditioning that can be used to understand possible sequences of skills, a practical mechanism to be used in, for example, human-robot-interactions for a more intelligent and adaptive robot behaviour. The Bayesian and variational inference based approach is evaluated on synthetic and real human demonstrations with varying complexities and dimensionality, showing the successful learning of segmentations and skill libraries from unlabelled data.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is to enable robots to autonomously learn skill segmentation and skill libraries through unlabeled trajectory data, thereby simplifying the teaching process of complex tasks. Specifically, the authors propose a new method (SKID), which can simultaneously learn to segment trajectories into repetitive patterns and the skills required to reconstruct these patterns from the original trajectories without additional supervision. This method aims to: 1. **Automatically segment and learn skills**: Automatically learn the segmentation of subtasks (skills) from complete task demonstrations without the need for manual labeling or decomposition of each individual skill. 2. **Skill conditioning**: Learn the conditional relationships between skills, that is, which skills may follow other skills. This helps in understanding the task structure and supports more intelligent human - robot interaction. 3. **Adapt to complex environments**: Handle tasks with different orders and numbers of skills, allowing robots to perform complex tasks more naturally. ### Method overview The SKID method is based on the variational auto - encoder (VAE) framework and combines the iterative concept to interpret sub - parts of the given data. Specific steps include: - **Model structure**: - Use RNN to handle the temporal dependence of trajectories. - Utilize the spatial transformer (ST) to extract sub - trajectories. - Use the discrete β - VAE to model the skill type \( z_s \) and achieve the learning of discrete variables through continuous Gumbel - Softmax approximation. - **Optimization objective**: - Maximize the evidence lower bound (ELBO), that is: \[ L(\tau; \theta, \phi) = \mathbb{E}_{q_\phi(z|\tau)}[\log p_\theta(\tau|z)] - \text{KL}(q_\phi(z|\tau) \| p(z)) \] - Introduce capacity terms \( C_d \) and \( C_s \) to control the decoupling of latent variables, for example: \[ L(\tau; \theta, \phi) = \mathbb{E}_{q_\phi(z|\tau)}[\log p_\theta(\tau|z)] - \gamma_d | \text{KL}(q_\phi(z_d|\tau) \| p(z_d)) - C_d | - \gamma_s | \text{KL}(q_\phi(z_s|\tau) \| p(z_s)) - C_s | \] ### Experimental verification The authors verified the effectiveness of SKID on multiple datasets, including 1D synthetic data, 3D synthetic data, 2D human - machine interaction data, and 2D teaching data. The experimental results show that SKID can successfully learn skill segmentation and skill libraries in various complex environments, and can also learn the conditional relationships between skills, which is very useful for understanding and predicting human behavior. ### Application prospects The application prospects of SKID are extensive, especially in the fields of human - robot interaction and robot learning. Through this technology, users can teach robots complex sequential tasks by directly demonstrating complete tasks without the need to specify each subtask in detail. In addition, SKID can also be used for planning new tasks, reducing the search space, and predicting human behavior to achieve more intelligent robot adaptation. ### Limitations Although SKID has demonstrated strong capabilities, its performance is not perfect. The main challenge is that the discrete VAE may sometimes miss certain skills, especially in real - world datasets with high noise and high variability. In addition, the use of continuous approximation during the training process may lead to differences in performance during testing.

SKID RAW: Skill Discovery from Raw Trajectories

Learning Robot Manipulation Skills from Human Demonstration Videos Using Two-Stream 2-D/3-D Residual Networks with Self-Attention

Unsupervised Discovery of Transitional Skills for Deep Reinforcement Learning

GSC: A Graph-Based Skill Composition Framework for Robot Learning

Grounding Language for Robotic Manipulation via Skill Library

Learning Novel Skills from Language-Generated Demonstrations

Visuospatial Skill Learning for Robots

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

A Framework for Learning and Reusing Robotic Skills

Language-guided Skill Learning with Temporal Variational Inference

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video

Learning Skills from Demonstrations: A Trend from Motion Primitives to Experience Abstraction

DexSkills: Skill Segmentation Using Haptic Data for Learning Autonomous Long-Horizon Robotic Manipulation Tasks

Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Agentic Skill Discovery

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

Learning and generalization of task-parameterized skills through few human demonstrations

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models