TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation

Wei Chen,Nicolas Rojas
2024-01-24
Abstract:Approaching robotic cloth manipulation using reinforcement learning based on visual feedback is appealing as robot perception and control can be learned simultaneously. However, major challenges result due to the intricate dynamics of cloth and the high dimensionality of the corresponding states, what shadows the practicality of the idea. To tackle these issues, we propose TraKDis, a novel Transformer-based Knowledge Distillation approach that decomposes the visual reinforcement learning problem into two distinct stages. In the first stage, a privileged agent is trained, which possesses complete knowledge of the cloth state information. This privileged agent acts as a teacher, providing valuable guidance and training signals for subsequent stages. The second stage involves a knowledge distillation procedure, where the knowledge acquired by the privileged agent is transferred to a vision-based agent by leveraging pre-trained state estimation and weight initialization. TraKDis demonstrates better performance when compared to state-of-the-art RL techniques, showing a higher performance of 21.9%, 13.8%, and 8.3% in cloth folding tasks in simulation. Furthermore, to validate robustness, we evaluate the agent in a noisy environment; the results indicate its ability to handle and adapt to environmental uncertainties effectively. Real robot experiments are also conducted to showcase the efficiency of our method in real-world scenarios.
Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **How to use reinforcement learning (RL) based on visual feedback to achieve effective manipulation of cloth, especially to overcome the challenges brought by the complexity of cloth dynamics and the high - dimensional state space**. ### Problem Background In robotics, manipulating deformable objects (such as cloth and other fabrics) is a major challenge and has broad application prospects, including in domestic, medical, and industrial scenarios. Although the manipulation of rigid - body objects is relatively mature, cloth manipulation is still in its infancy. Although traditional state - information - based reinforcement learning methods can achieve satisfactory results in simulated environments, it is very difficult to obtain these precise state information in practical applications. Therefore, manipulation directly from visual inputs (such as RGB images) has become a more feasible method, but it also faces higher challenges, especially when dealing with the high self - occlusion of cloth and the lack of trackable features. ### Solution To solve these problems, the paper proposes **TraKDis**, a Transformer - based knowledge distillation (KD) method, aiming to improve the performance of visual reinforcement learning in cloth - manipulation tasks through two - stage learning: 1. **First stage: Training the Privileged Agent** - Train a privileged agent using complete cloth - state information (such as cloth - particle positions) as a teacher model. - The privileged agent can provide valuable guidance and training signals to assist learning in subsequent stages. 2. **Second stage: Knowledge Distillation** - Through pre - trained state - estimation encoders and weight initialization, transfer the knowledge of the privileged agent to the vision - based agent (student model). - The student model only relies on partial observations (such as RGB images) and learns cloth - manipulation tasks by imitating the behavior of the privileged agent. ### Main Contributions - Propose **TraKDis**, a Transformer - based knowledge - distillation framework for learning visual cloth - manipulation tasks. - Design a new knowledge - distillation method that combines state - estimation encoders and pre - trained weights, significantly improving the model's performance and training efficiency. - Experimental results show that TraKDis outperforms existing state - of - the - art methods in multiple benchmark tests, especially in cloth - folding tasks, with performance improvements of 21.9%, 13.8%, and 8.3%. Through this method, the paper successfully solves the challenges brought by visual feedback and high - dimensional state space in cloth manipulation, demonstrating its potential in practical applications.