Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand Manipulation

Lanpei Li,Enrico Donato,Vincenzo Lomonaco,Egidio Falotico
2024-04-06
Abstract:Dexterous manipulation, often facilitated by multi-fingered robotic hands, holds solid impact for real-world applications. Soft robotic hands, due to their compliant nature, offer flexibility and adaptability during object grasping and manipulation. Yet, benefits come with challenges, particularly in the control development for finger coordination. Reinforcement Learning (RL) can be employed to train object-specific in-hand manipulation policies, but limiting adaptability and generalizability. We introduce a Continual Policy Distillation (CPD) framework to acquire a versatile controller for in-hand manipulation, to rotate different objects in shape and size within a four-fingered soft gripper. The framework leverages Policy Distillation (PD) to transfer knowledge from expert policies to a continually evolving student policy network. Exemplar-based rehearsal methods are then integrated to mitigate catastrophic forgetting and enhance generalization. The performance of the CPD framework over various replay strategies demonstrates its effectiveness in consolidating knowledge from multiple experts and achieving versatile and adaptive behaviours for in-hand manipulation tasks.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving flexible and versatile object manipulation capabilities in soft robotic hand operations. Specifically, the paper focuses on how to train hand operation strategies that can adapt to objects of different shapes and sizes through Reinforcement Learning (RL). However, these strategies often have limitations, especially when dealing with multiple specific objects, making it difficult to balance generality and adaptability. To tackle this challenge, the authors propose a Continual Policy Distillation (CPD) framework, which aims to extract knowledge from multiple expert policies and integrate it into a continuously evolving student policy network, thereby obtaining a general and flexible controller. Additionally, the CPD framework incorporates an example-based replay method to mitigate catastrophic forgetting and improve the model's generalization ability. The main contributions of the paper include: 1. **Proposing the CPD framework**: This framework transfers knowledge from multiple expert policies to a student policy network through policy distillation techniques, enabling it to continuously learn and improve without accessing pre-trained data. 2. **Mitigating catastrophic forgetting**: By integrating an example-based replay method, the CPD framework can retain previously learned knowledge while learning new tasks, thus avoiding catastrophic forgetting. 3. **Experimental validation**: A series of experiments were conducted on a four-finger soft robotic hand, validating the effectiveness and robustness of the CPD framework in handling rotation tasks with objects of different shapes and sizes. Overall, the paper aims to overcome the limitations of traditional reinforcement learning methods in soft robotic hand operations through the CPD framework, achieving more general and flexible control strategies.