Changshi Zhou,Haichuan Xu,Jiarui Hu,Feng Luan,Zhipeng Wang,Yanchao Dong,Yanmin Zhou,Bin He
Abstract:Robotic cloth manipulation faces challenges due to the fabric's complex dynamics and the high dimensionality of configuration spaces. Previous methods have largely focused on isolated smoothing or folding tasks and overly reliant on simulations, often failing to bridge the significant sim-to-real gap in deformable object manipulation. To overcome these challenges, we propose a two-stream architecture with sequential and spatial pathways, unifying smoothing and folding tasks into a single adaptable policy model that accommodates various cloth types and states. The sequential stream determines the pick and place positions for the cloth, while the spatial stream, using a connectivity dynamics model, constructs a visibility graph from partial point cloud data of the self-occluded cloth, allowing the robot to infer the cloth's full configuration from incomplete observations. To bridge the sim-to-real gap, we utilize a hand tracking detection algorithm to gather and integrate human demonstration data into our novel end-to-end neural network, improving real-world adaptability. Our method, validated on a UR5 robot across four distinct cloth folding tasks with different goal shapes, consistently achieves folded states from arbitrary crumpled initial configurations, with success rates of 99\%, 99\%, 83\%, and 67\%. It outperforms existing state-of-the-art cloth manipulation techniques and demonstrates strong generalization to unseen cloth with diverse colors, shapes, and stiffness in real-world <a class="link-external link-http" href="http://experiments.Videos" rel="external noopener nofollow">this http URL</a> and source code are available at: <a class="link-external link-https" href="https://zcswdt.github.io/SSFold/" rel="external noopener nofollow">this https URL</a>
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve the challenges that robots face when manipulating cloth. Specifically, the complex dynamics and high - dimensional configuration space of cloth make it difficult for robots to effectively perform cloth flattening and folding tasks. Traditional methods usually focus on isolated smoothing or folding tasks and rely too much on the simulation environment, resulting in a large gap from simulation to reality (sim - to - real) and being unable to adapt well to the manipulation of deformable objects in the real world.
To solve these problems, the authors propose a new framework named SSFold. This framework unifies the smoothing and folding tasks by combining human demonstration data with advanced learning techniques and can handle various types of cloth and their initial states. Specific contributions include:
1. **Two - stream architecture**: A two - stream architecture with sequential and spatial paths is proposed, unifying the smoothing and folding tasks into a single adaptable policy model.
2. **Visibility graph construction**: A visibility graph is constructed using partial point - cloud data to overcome the cloth self - occlusion problem, enabling the robot to infer the complete configuration of the cloth from incomplete observations.
3. **Human demonstration data integration**: Hand - tracking detection algorithms are used to collect and integrate human demonstration data, thereby improving the model's adaptability and generalization ability in the real world.
4. **Efficient data collection**: Hand - tracking and keypoint detection are achieved through a low - cost monocular camera system, avoiding the need for complex and expensive traditional equipment.
Finally, SSFold was verified in four different cloth - folding tasks on the UR5 robot, successfully achieving the folding of the target shape from an arbitrarily wrinkled initial configuration, with success rates of 99%, 99%, 83% and 67% respectively. This shows that this method not only performs well in a standardized setting but can also be robustly generalized in unseen tasks.
### Formula summary
- **Definition of the edges of the visibility graph**:
\[
E_C=\{e_{ij}\mid \|v_i - v_j\|_2 < R\}
\]
where \(e_{ij}\) represents the connection between nodes \(v_i\) and \(v_j\), and \(R\) is the distance threshold.
- **Optimal placement position selection**:
\[
i^*=\arg\max_i\left(\max_{(u,v)}P_i(u, v)\mid T_{\text{pick}}^i\right)
\]
\[
T_{\text{place}}=\arg\max_{(u,v)}P_{i^*}(u, v)\mid T_{\text{pick}}^{i^*}
\]
- **Grasping direction optimization**:
\[
x = \frac{T_{\text{pick}}-T_{\text{place}}}{\|T_{\text{pick}}-T_{\text{place}}\|}
\]
\[
y=\frac{[0,0,-1]\times x}{\|[0,0,-1]\times x\|}
\]
\[
x = y\times[0,0,-1]
\]
\[
R=[x\quad y\quad[0,0,-1]]
\]
These formulas and methods together ensure the efficiency and accuracy of SSFold in handling complex cloth - manipulation tasks.