Abstract:We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to enhance the robustness of imitation learning methods by generating correction data to address cumulative errors and disturbances. Specifically, existing imitation learning methods often require interactive experts, additional offline datasets, or domain-specific invariances. The method proposed in this paper requires only minimal assumptions beyond access to expert data. **Main Objectives:** - **Enhance the robustness of imitation learning**: Expand the dataset by generating correction labels, enabling imitation learning algorithms to better handle unknown states. - **Utilize local continuity**: Leverage local continuity in environmental dynamics to generate correction labels, thereby extending the model's capabilities beyond the support of expert data. - **Theoretical guarantees**: Provide theoretical guarantees to demonstrate the quality of the generated labels and show how to impose the required local smoothness when training dynamic functions. - **Extensive validation**: Conduct experiments in various robotic simulation environments, covering tasks such as classical control problems, drone flight, high-dimensional sensor navigation, legged locomotion, and tabletop manipulation. ### Main Contributions 1. **Problem Definition**: Formally define correction labels used to enhance the robustness of imitation learning. 2. **Practical Algorithm**: Propose a framework named CCIL (Correction Label Generation based on Continuity), which generates correction labels by learning a dynamics model with local continuity. 3. **Theoretical Guarantees**: Demonstrate how to leverage local continuity in dynamic systems to extend the capabilities of learned models and provide theoretical bounds on the quality of generated labels. 4. **Extensive Empirical Validation**: Conduct extensive experiments in four different robotic simulation environments to validate the effectiveness and robustness of the CCIL method. Through these contributions, the paper demonstrates how to improve the robustness and generalization ability of imitation learning methods by leveraging local continuity in the absence of a large number of expert demonstrations.

CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

Off-Dynamics Inverse Reinforcement Learning

Active Learning with Controllable Augmentation Induced Acquisition

Towards Controlled Data Augmentations for Active Learning.

Expert Data Augmentation in Imitation Learning (Student Abstract)

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms

Implicit Counterfactual Data Augmentation for Robust Learning

Continuous Value Assignment: A Doubly Robust Data Augmentation for Off-Policy Learning

Distributional Cloning for Stabilized Imitation Learning via ADMM

Adaptive Rentention & Correction for Continual Learning

Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

RoCoDA: Counterfactual Data Augmentation for Data-Efficient Robot Learning from Demonstrations

DABI: Evaluation of Data Augmentation Methods Using Downsampling in Bilateral Control-Based Imitation Learning with Images

Improving Plasticity in Online Continual Learning via Collaborative Learning

ACAMDA: Improving Data Efficiency in Reinforcement Learning Through Guided Counterfactual Data Augmentation

Data Quality in Imitation Learning

Offline Imitation Learning with Variational Counterfactual Reasoning