Abstract:Inspired by traditional handmade crafts, where a person improvises assemblies based on the available objects, we formally introduce the Craft Assembly Task. It is a robotic assembly task that involves building an accurate representation of a given target object using the available objects, which do not directly correspond to its parts. In this work, we focus on selecting the subset of available objects for the final craft, when the given input is an RGB image of the target in the wild. We use a mask segmentation neural network to identify visible parts, followed by retrieving labelled template meshes. These meshes undergo pose optimization to determine the most suitable template. Then, we propose to simplify the parts of the transformed template mesh to primitive shapes like cuboids or cylinders. Finally, we design a search algorithm to find correspondences in the scene based on local and global proportions. We develop baselines for comparison that consider all possible combinations, and choose the highest scoring combination for common metrics used in foreground maps and mask accuracy. Our approach achieves comparable results to the baselines for two different scenes, and we show qualitative results for an implementation in a real-world scenario.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to select appropriate objects to assemble a handicraft similar to the target object in both appearance and function, given a single RGB image of the target object and a set of available objects. Specifically, the paper proposes a method to achieve this goal by identifying the visible parts of the target object from the RGB image, retrieving and optimizing template meshes, simplifying parts of these meshes into basic shapes (such as cubes or cylinders), and finding the objects in the scene that best match these parts through a search algorithm. ### Main contributions of the paper: 1. **Formal introduction of Craft Assembly Task**: This is a novel and open - ended assembly task inspired by DIY handicrafts, aiming to construct an accurate and functional representation of the target object using a set of available objects that do not directly correspond to the parts of the target object. 2. **Development of a framework**: This framework solves the Craft Assembly Task through template mesh retrieval, reducing the need for a large number of segmented 3D models. 3. **Proposing a search algorithm**: This algorithm is used to find the most similar basic - shape counterparts based on size ratios in the absence of an exact match. ### Method overview: 1. **Part segmentation**: Use a fine - tuned vision transformer (such as EVA02) to obtain the visible part segmentation masks of the target object from the RGB input image. 2. **Template mesh retrieval and pose optimization**: According to the segmentation results, retrieve the template meshes of the corresponding object categories from the database, and optimize the camera parameters through a differentiable renderer to align the rendered image with the segmentation mask. 3. **Generate missing components**: Assume the left - right symmetry of the target object to generate occluded parts, and add internal components to some objects to maintain functional consistency. 4. **Basic - shape simplification**: Simplify each part of the adjusted model into a basic shape (cube or cylinder), and select the best candidate shape by calculating the Chamfer distance. 5. **Scene matching**: Design a search algorithm to match the simplified model parts with the objects in the scene according to local and overall ratios. ### Evaluation metrics: Since there are no ready - made ground - truth solutions, the paper proposes a success rate evaluation method based on different metrics, including 3D pose accuracy, part number correctness, and contour matching degree, etc. In addition, the performance in different scenarios is also demonstrated through comparison with other baseline methods. ### Conclusion: The method proposed in the paper achieves results comparable to the baseline methods in two different scenarios, especially performing well in the average part intersection - over - union (IoU) metric. This indicates that the method has a certain effectiveness and robustness in handling the Craft Assembly Task.

Component Selection for Craft Assembly Tasks

PartCraft: Crafting Creative Objects by Parts

Towards Robotic Assembly by Predicting Robust, Precise and Task-oriented Grasps

On CAD Informed Adaptive Robotic Assembly

A Skeleton-Based Assembly Action Recognition Method with Feature Fusion for Human-Robot Collaborative Assembly

Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images

Research on Component-Based Collaborative Assembly Technology for Virtual Prototype Unit

6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation

Multi-class Assembly Parts Recognition Using Composite Feature and Random Forest for Robot Programming by Demonstration.

Autonomous Robotic Assembly: From Part Singulation to Precise Assembly

Assemble Them All

Sketch-to-Design: Context-based Part Assembly

Interactive task planning in virtual assembly.

Selecting and designing grippers for an assembly task in a structured approach

A Novel Approach To Component Assembly Inspection Based On Mask R-Cnn And Support Vector Machines

A digital twin-driven human-machine interactive assembly method based on lightweight multi-target detection and assembly feature generation

Rearrangement Planning for General Part Assembly

Physics-Aware Combinatorial Assembly Sequence Planning using Data-free Action Masking

3D Geometric Shape Assembly via Efficient Point Cloud Matching

CRAFT Objects from Images

Crafting Parts for Expressive Object Composition