Abstract:Approaching robotic cloth manipulation using reinforcement learning based on visual feedback is appealing as robot perception and control can be learned simultaneously. However, major challenges result due to the intricate dynamics of cloth and the high dimensionality of the corresponding states, what shadows the practicality of the idea. To tackle these issues, we propose TraKDis, a novel Transformer-based Knowledge Distillation approach that decomposes the visual reinforcement learning problem into two distinct stages. In the first stage, a privileged agent is trained, which possesses complete knowledge of the cloth state information. This privileged agent acts as a teacher, providing valuable guidance and training signals for subsequent stages. The second stage involves a knowledge distillation procedure, where the knowledge acquired by the privileged agent is transferred to a vision-based agent by leveraging pre-trained state estimation and weight initialization. TraKDis demonstrates better performance when compared to state-of-the-art RL techniques, showing a higher performance of 21.9%, 13.8%, and 8.3% in cloth folding tasks in simulation. Furthermore, to validate robustness, we evaluate the agent in a noisy environment; the results indicate its ability to handle and adapt to environmental uncertainties effectively. Real robot experiments are also conducted to showcase the efficiency of our method in real-world scenarios.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: **How to use reinforcement learning (RL) based on visual feedback to achieve effective manipulation of cloth, especially to overcome the challenges brought by the complexity of cloth dynamics and the high - dimensional state space**. ### Problem Background In robotics, manipulating deformable objects (such as cloth and other fabrics) is a major challenge and has broad application prospects, including in domestic, medical, and industrial scenarios. Although the manipulation of rigid - body objects is relatively mature, cloth manipulation is still in its infancy. Although traditional state - information - based reinforcement learning methods can achieve satisfactory results in simulated environments, it is very difficult to obtain these precise state information in practical applications. Therefore, manipulation directly from visual inputs (such as RGB images) has become a more feasible method, but it also faces higher challenges, especially when dealing with the high self - occlusion of cloth and the lack of trackable features. ### Solution To solve these problems, the paper proposes **TraKDis**, a Transformer - based knowledge distillation (KD) method, aiming to improve the performance of visual reinforcement learning in cloth - manipulation tasks through two - stage learning: 1. **First stage: Training the Privileged Agent** - Train a privileged agent using complete cloth - state information (such as cloth - particle positions) as a teacher model. - The privileged agent can provide valuable guidance and training signals to assist learning in subsequent stages. 2. **Second stage: Knowledge Distillation** - Through pre - trained state - estimation encoders and weight initialization, transfer the knowledge of the privileged agent to the vision - based agent (student model). - The student model only relies on partial observations (such as RGB images) and learns cloth - manipulation tasks by imitating the behavior of the privileged agent. ### Main Contributions - Propose **TraKDis**, a Transformer - based knowledge - distillation framework for learning visual cloth - manipulation tasks. - Design a new knowledge - distillation method that combines state - estimation encoders and pre - trained weights, significantly improving the model's performance and training efficiency. - Experimental results show that TraKDis outperforms existing state - of - the - art methods in multiple benchmark tests, especially in cloth - folding tasks, with performance improvements of 21.9%, 13.8%, and 8.3%. Through this method, the paper successfully solves the challenges brought by visual feedback and high - dimensional state space in cloth manipulation, demonstrating its potential in practical applications.

TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation

Learning to Collaborate from Simulation for Robot-Assisted Dressing

Dynamic Cloth Folding Using Curriculum Learning

Dexterous robotic manipulation using deep reinforcement learning and knowledge transfer for complex sparse reward‐based tasks

Visual-Policy Learning through Multi-Camera View to Single-Camera View Knowledge Distillation for Robot Manipulation Tasks

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Simpler Learning of Robotic Manipulation of Clothing by Utilizing DIY Smart Textile Technology

Learning Visual Feedback Control for Dynamic Cloth Folding

Reinforcement Learning via Auxiliary Task Distillation

DeepCloth-ROB$^2_{\text{QS}}$P&P: Towards a Robust Robot Deployment for Quasi-Static Pick-and-Place Cloth-Shaping Neural Controllers

TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

ViTKD: Feature-based Knowledge Distillation for Vision Transformers

Learning Cloth Folding Tasks with Refined Flow Based Spatio-Temporal Graphs

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

SSFold: Learning to Fold Arbitrary Crumpled Cloth Using Graph Dynamics from Human Demonstration

What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?

Exploring CausalWorld: Enhancing robotic manipulation via knowledge transfer and curriculum learning

Knowledge Distillation via Query Selection for Detection Transformer

Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery

Know Thyself: Transferable Visual Control Policies Through Robot-Awareness

Residual Reinforcement Learning from Demonstrations