Ruida Zhang,Chenyangguang Zhang,Yan Di,Fabian Manhardt,Xingyu Liu,Federico Tombari,Xiangyang Ji
Abstract:In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models from a pre-processed database to tightly match the target. Unlike existing dense matching based methods that typically struggle with noisy partial scans, we propose to leverage category-consistent sparse keypoints to naturally handle both full and partial object scans. Specifically, we first employ a lightweight retrieval module to establish a keypoint-based embedding space, measuring the similarity among objects by dynamically aggregating deformation-aware local-global features around extracted keypoints. Objects that are close in the embedding space are considered similar in geometry. Then we introduce the neural cage-based deformation module that estimates the influence vector of each keypoint upon cage vertices inside its local support region to control the deformation of the retrieved shape. Extensive experiments on the synthetic dataset PartNet and the real-world dataset Scan2CAD demonstrate that KP-RED surpasses existing state-of-the-art approaches by a large margin. Codes and trained models will be released in
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
The paper aims to address the problem of generating high-quality 3D models from noisy object scans. Specifically, the paper proposes a method called KP-RED, which utilizes semantic keypoints to jointly retrieve and deform 3D models to match the target object. Existing methods often perform poorly when dealing with partial scans and noisy data, whereas KP-RED introduces a keypoint-driven framework that can handle these issues more effectively.
### Background and Challenges
1. **Limitations of Existing Methods**:
- **Dense Matching Methods**: These methods usually struggle with partial scans and noisy data because they rely on the global features of the point cloud, which can lead to the loss of local geometric information.
- **Deformation Modules**: Existing deformation methods typically depend on dense point matching, but random outliers can significantly mislead the matching process, resulting in suboptimal deformation outcomes.
2. **Proposed Solutions in the Paper**:
- **Keypoint-Driven Framework**: KP-RED uses sparse keypoints as intermediate representations instead of directly using dense point matching. These keypoints have semantic consistency within each category, effectively handling occlusions and noise.
- **Local-Global Feature Aggregation**: Through a self-attention mechanism, KP-RED can capture both local and global information, thereby preserving more geometric details during the retrieval and deformation process.
- **Neural Cage Deformation**: KP-RED introduces a neural cage-based deformation scheme, using keypoints to control local deformations, ensuring that the deformed model closely matches the target object.
### Method Overview
1. **Retrieval Module**:
- **Keypoint Detection**: A keypoint detector is first used to predict keypoints in the target point cloud.
- **Local Feature Aggregation**: Through point cloud feature extraction and pooling operations, local features within the support region of each keypoint are obtained.
- **Self-Attention Mechanism**: The self-attention mechanism is used to discover associations between regions and predict local retrieval tokens for each keypoint region.
- **Global Embedding Space**: All local tokens are concatenated in a unified order to generate a global deformation-aware token, which is used to retrieve the most similar model from the database.
2. **Deformation Module**:
- **Keypoint Detection**: Keypoint detection is performed on the retrieved source model.
- **Influence Vector Prediction**: The self-attention mechanism is used to predict the influence vectors of each keypoint on its support cage vertices.
- **Deformation Calculation**: Using the influence vectors and mean coordinate interpolation method, the deformed cage vertices are calculated, resulting in the deformed model.
### Experimental Results
1. **Datasets**:
- **PartNet**: A synthetic dataset containing 1419 models, 11433 training instances, and 2861 test instances.
- **Scan2CAD**: A real-world dataset based on ScanNet, providing 14225 real object models, poses, and CAD models.
2. **Evaluation Metrics**:
- **Chamfer Distance (CD)**: Used for evaluating complete shapes.
- **Unilateral Chamfer Distance (UCD)**: Used for evaluating partial shapes.
3. **Experimental Results**:
- **Complete Shapes**: On the PartNet dataset, KP-RED outperforms existing methods across all categories and datasets, showing significant advantages in the Chamfer Distance metric.
- **Partial Shapes**: When dealing with partial scans, KP-RED improves robustness to partial inputs by introducing a density-based dynamic feature extraction method.
### Conclusion
KP-RED effectively addresses the problem of generating high-quality 3D models from noisy object scans by introducing a keypoint-driven joint retrieval and deformation framework. The method performs excellently in handling partial scans and noisy data, showing great potential for widespread applications.