Transformer Based Feature Pyramid Network for Transparent Objects Grasp

Jiawei Zhang,Houde Liu,Chongkun Xia
DOI: https://doi.org/10.1007/978-3-031-13822-5_37
2022-01-01
Abstract:Transparent objects like glass bottles and plastic cups are common in daily life, while few works show good performance on grasping transparent objects due to their unique optic properties. Besides the difficulties of this task, there is no dataset for transparent objects grasp. To address this problem, we propose an efficient dataset construction pipeline to label grasp pose for transparent objects. With Blender physics engines, our pipeline could generate numerous photo-realistic images and label grasp poses in a short time. We also propose TTG-Net - a transformer-based feature pyramid network for generating planar grasp pose, which utilizes features pyramid network with residual module to extract features and use transformer encoder to refine features for better global information. TTG-Net is fully trained on the virtual dataset generated by our pipeline and it shows 80.4% validation accuracy on the virtual dataset. To prove the effectiveness of TTG-Net on real-world data, we also test TTG-Net with photos randomly captured in our lab. TTG-Net shows 73.4% accuracy on real-world benchmark which shows remarkable sim2real generalization. We also evaluate other main-stream methods on our dataset, TTG-Net shows better generalization ability.
What problem does this paper attempt to address?