SNAKE: Shape-aware Neural 3D Keypoint Field

Chengliang Zhong,Peixing You,Xiaoxue Chen,Hao Zhao,Fuchun Sun,Guyue Zhou,Xiaodong Mu,Chuang Gan,Wenbing Huang
DOI: https://doi.org/10.48550/arXiv.2206.01724
2022-10-17
Abstract:Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection? Existing methods either seek salient features according to statistics of different orders or learn to predict keypoints that are invariant to transformation. Nevertheless, the idea of incorporating shape reconstruction into 3D keypoint detection is under-explored. We argue that this is restricted by former problem formulations. To this end, a novel unsupervised paradigm named SNAKE is proposed, which is short for shape-aware neural 3D keypoint field. Similar to recent coordinate-based radiance or distance field, our network takes 3D coordinates as inputs and predicts implicit shape indicators and keypoint saliency simultaneously, thus naturally entangling 3D keypoint detection and shape reconstruction. We achieve superior performance on various public benchmarks, including standalone object datasets ModelNet40, KeypointNet, SMPL meshes and scene-level datasets 3DMatch and Redwood. Intrinsic shape awareness brings several advantages as follows. (1) SNAKE generates 3D keypoints consistent with human semantic annotation, even without such supervision. (2) SNAKE outperforms counterparts in terms of repeatability, especially when the input point clouds are down-sampled. (3) the generated keypoints allow accurate geometric registration, notably in a zero-shot setting. Codes are available at <a class="link-external link-https" href="https://github.com/zhongcl-thu/SNAKE" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "SNAKE: Shape-aware Neural 3D Keypoint Field" attempts to address the problem of detecting 3D keypoints from point cloud data. Specifically, the authors explore a dual requirement: can shape reconstruction improve the performance of 3D keypoint detection? Existing methods either find salient features through statistical features of different orders or learn to predict transformation-invariant keypoints. However, the combination of shape reconstruction and 3D keypoint detection has not been fully explored. The authors believe this is due to the limitations of previous problem formulations. To this end, the authors propose a new unsupervised paradigm—SNAKE (Shape-aware Neural 3D Keypoint Field), which predicts implicit shape indicators and keypoint saliency by inputting 3D coordinates, thereby naturally combining 3D keypoint detection and shape reconstruction. SNAKE achieves superior performance on multiple public benchmark datasets, including object-level datasets (such as ModelNet40, KeypointNet, SMPL mesh) and scene-level datasets (such as 3DMatch and Redwood). Shape awareness brings the following advantages: 1. **Semantic Consistency**: The 3D keypoints generated by SNAKE are consistent with human semantic annotations, even without such supervision. 2. **Repeatability**: SNAKE outperforms other methods in terms of repeatability, especially when the input point cloud is downsampled. 3. **Geometric Registration**: The generated keypoints allow for accurate geometric registration, particularly in zero-shot settings. ### Main Contributions 1. **Proposed a New Network**: Based on implicit neural representation, jointly performing surface reconstruction and 3D keypoint detection. During training, various self-supervised loss functions were developed, leveraging the interrelationship between the two decoders. During testing, a gradient-based optimization strategy was designed to maximize keypoint saliency. 2. **Extensive Quantitative and Qualitative Evaluation**: Evaluated on object-level datasets (ModelNet40, KeypointNet, SMPL mesh) and scene-level datasets (3DMatch and Redwood), demonstrating SNAKE's state-of-the-art performance in semantic consistency, repeatability, and geometric registration. ### Related Work 1. **3D Keypoint Detectors**: 3D keypoint detection methods are mainly divided into handcrafted and learning-based methods. Traditional handcrafted methods are usually based on local geometric statistics but struggle to detect consistent keypoints under real-world disturbances. Modern learning-based methods rely on consistency under geometric transformations, but the generated keypoints lack semantic saliency. The recent UKPGAN method uses reconstruction to find semantically aware 3D keypoints, but it recovers explicit coordinates rather than implicit shape indicators. 2. **Implicit Neural Representation**: SNAKE leverages implicit neural representation to parameterize continuous 3D keypoint fields, inspired by research on neural radiance fields and neural distance fields. Unlike explicit 3D representations (such as point clouds, voxels, or meshes), implicit neural functions can continuously decode shapes and learn complex shape topologies. ### Method SNAKE is a shape-aware implicit network for 3D keypoint detection. SNAKE conditions two implicit decoders (one for shape, one for keypoint saliency) on shared volumetric feature embeddings. To encourage repeatable, uniformly distributed, and sparse keypoints, various self-supervised loss functions are employed, combining predicted surface occupancy and keypoint saliency. During inference, a gradient-based optimization method is used to further optimize query points with high saliency. ### Experiments 1. **Semantic Consistency**: Evaluated keypoint semantic consistency between different instances on the KeypointNet and SMPL datasets. SNAKE achieved higher mIoU at most thresholds, and the generated keypoints were well-aligned with human annotations. 2. **Repeatability**: Evaluated keypoint repeatability under different viewpoint point clouds on the ModelNet40 and Redwood datasets. SNAKE exhibited the highest repeatability in most cases, as the shape-aware strategy helps the model infer the underlying shape of objects/scenes, making keypoints robust to input variations.