Component Selection for Craft Assembly Tasks

Vitor Hideyo Isume,Takuya Kiyokawa,Natsuki Yamanobe,Yukiyasu Domae,Weiwei Wan,Kensuke Harada
DOI: https://doi.org/10.1109/LRA.2024.3440847
2024-08-16
Abstract:Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to select appropriate objects to assemble a handicraft similar to the target object in both appearance and function, given a single RGB image of the target object and a set of available objects. Specifically, the paper proposes a method to achieve this goal by identifying the visible parts of the target object from the RGB image, retrieving and optimizing template meshes, simplifying parts of these meshes into basic shapes (such as cubes or cylinders), and finding the objects in the scene that best match these parts through a search algorithm. ### Main contributions of the paper: 1. **Formal introduction of Craft Assembly Task**: This is a novel and open - ended assembly task inspired by DIY handicrafts, aiming to construct an accurate and functional representation of the target object using a set of available objects that do not directly correspond to the parts of the target object. 2. **Development of a framework**: This framework solves the Craft Assembly Task through template mesh retrieval, reducing the need for a large number of segmented 3D models. 3. **Proposing a search algorithm**: This algorithm is used to find the most similar basic - shape counterparts based on size ratios in the absence of an exact match. ### Method overview: 1. **Part segmentation**: Use a fine - tuned vision transformer (such as EVA02) to obtain the visible part segmentation masks of the target object from the RGB input image. 2. **Template mesh retrieval and pose optimization**: According to the segmentation results, retrieve the template meshes of the corresponding object categories from the database, and optimize the camera parameters through a differentiable renderer to align the rendered image with the segmentation mask. 3. **Generate missing components**: Assume the left - right symmetry of the target object to generate occluded parts, and add internal components to some objects to maintain functional consistency. 4. **Basic - shape simplification**: Simplify each part of the adjusted model into a basic shape (cube or cylinder), and select the best candidate shape by calculating the Chamfer distance. 5. **Scene matching**: Design a search algorithm to match the simplified model parts with the objects in the scene according to local and overall ratios. ### Evaluation metrics: Since there are no ready - made ground - truth solutions, the paper proposes a success rate evaluation method based on different metrics, including 3D pose accuracy, part number correctness, and contour matching degree, etc. In addition, the performance in different scenarios is also demonstrated through comparison with other baseline methods. ### Conclusion: The method proposed in the paper achieves results comparable to the baseline methods in two different scenarios, especially performing well in the average part intersection - over - union (IoU) metric. This indicates that the method has a certain effectiveness and robustness in handling the Craft Assembly Task.