SKID RAW: Skill Discovery from Raw Trajectories

Daniel Tanneberg,Kai Ploeger,Elmar Rueckert,Jan Peters
DOI: https://doi.org/10.1109/LRA.2021.3068891
2021-03-27
Abstract:Integrating robots in complex everyday environments requires a multitude of problems to be solved. One crucial feature among those is to equip robots with a mechanism for teaching them a new task in an easy and natural way. When teaching tasks that involve sequences of different skills, with varying order and number of these skills, it is desirable to only demonstrate full task executions instead of all individual skills. For this purpose, we propose a novel approach that simultaneously learns to segment trajectories into reoccurring patterns and the skills to reconstruct these patterns from unlabelled demonstrations without further supervision. Moreover, the approach learns a skill conditioning that can be used to understand possible sequences of skills, a practical mechanism to be used in, for example, human-robot-interactions for a more intelligent and adaptive robot behaviour. The Bayesian and variational inference based approach is evaluated on synthetic and real human demonstrations with varying complexities and dimensionality, showing the successful learning of segmentations and skill libraries from unlabelled data.
Machine Learning,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to enable robots to autonomously learn skill segmentation and skill libraries through unlabeled trajectory data, thereby simplifying the teaching process of complex tasks. Specifically, the authors propose a new method (SKID), which can simultaneously learn to segment trajectories into repetitive patterns and the skills required to reconstruct these patterns from the original trajectories without additional supervision. This method aims to: 1. **Automatically segment and learn skills**: Automatically learn the segmentation of subtasks (skills) from complete task demonstrations without the need for manual labeling or decomposition of each individual skill. 2. **Skill conditioning**: Learn the conditional relationships between skills, that is, which skills may follow other skills. This helps in understanding the task structure and supports more intelligent human - robot interaction. 3. **Adapt to complex environments**: Handle tasks with different orders and numbers of skills, allowing robots to perform complex tasks more naturally. ### Method overview The SKID method is based on the variational auto - encoder (VAE) framework and combines the iterative concept to interpret sub - parts of the given data. Specific steps include: - **Model structure**: - Use RNN to handle the temporal dependence of trajectories. - Utilize the spatial transformer (ST) to extract sub - trajectories. - Use the discrete β - VAE to model the skill type \( z_s \) and achieve the learning of discrete variables through continuous Gumbel - Softmax approximation. - **Optimization objective**: - Maximize the evidence lower bound (ELBO), that is: \[ L(\tau; \theta, \phi) = \mathbb{E}_{q_\phi(z|\tau)}[\log p_\theta(\tau|z)] - \text{KL}(q_\phi(z|\tau) \| p(z)) \] - Introduce capacity terms \( C_d \) and \( C_s \) to control the decoupling of latent variables, for example: \[ L(\tau; \theta, \phi) = \mathbb{E}_{q_\phi(z|\tau)}[\log p_\theta(\tau|z)] - \gamma_d | \text{KL}(q_\phi(z_d|\tau) \| p(z_d)) - C_d | - \gamma_s | \text{KL}(q_\phi(z_s|\tau) \| p(z_s)) - C_s | \] ### Experimental verification The authors verified the effectiveness of SKID on multiple datasets, including 1D synthetic data, 3D synthetic data, 2D human - machine interaction data, and 2D teaching data. The experimental results show that SKID can successfully learn skill segmentation and skill libraries in various complex environments, and can also learn the conditional relationships between skills, which is very useful for understanding and predicting human behavior. ### Application prospects The application prospects of SKID are extensive, especially in the fields of human - robot interaction and robot learning. Through this technology, users can teach robots complex sequential tasks by directly demonstrating complete tasks without the need to specify each subtask in detail. In addition, SKID can also be used for planning new tasks, reducing the search space, and predicting human behavior to achieve more intelligent robot adaptation. ### Limitations Although SKID has demonstrated strong capabilities, its performance is not perfect. The main challenge is that the discrete VAE may sometimes miss certain skills, especially in real - world datasets with high noise and high variability. In addition, the use of continuous approximation during the training process may lead to differences in performance during testing.