Abstract:Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection? Existing methods either seek salient features according to statistics of different orders or learn to predict keypoints that are invariant to transformation. Nevertheless, the idea of incorporating shape reconstruction into 3D keypoint detection is under-explored. We argue that this is restricted by former problem formulations. To this end, a novel unsupervised paradigm named SNAKE is proposed, which is short for shape-aware neural 3D keypoint field. Similar to recent coordinate-based radiance or distance field, our network takes 3D coordinates as inputs and predicts implicit shape indicators and keypoint saliency simultaneously, thus naturally entangling 3D keypoint detection and shape reconstruction. We achieve superior performance on various public benchmarks, including standalone object datasets ModelNet40, KeypointNet, SMPL meshes and scene-level datasets 3DMatch and Redwood. Intrinsic shape awareness brings several advantages as follows. (1) SNAKE generates 3D keypoints consistent with human semantic annotation, even without such supervision. (2) SNAKE outperforms counterparts in terms of repeatability, especially when the input point clouds are down-sampled. (3) the generated keypoints allow accurate geometric registration, notably in a zero-shot setting. Codes are available at <a class="link-external link-https" href="https://github.com/zhongcl-thu/SNAKE" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper "SNAKE: Shape-aware Neural 3D Keypoint Field" attempts to address the problem of detecting 3D keypoints from point cloud data. Specifically, the authors explore a dual requirement: can shape reconstruction improve the performance of 3D keypoint detection? Existing methods either find salient features through statistical features of different orders or learn to predict transformation-invariant keypoints. However, the combination of shape reconstruction and 3D keypoint detection has not been fully explored. The authors believe this is due to the limitations of previous problem formulations. To this end, the authors propose a new unsupervised paradigm—SNAKE (Shape-aware Neural 3D Keypoint Field), which predicts implicit shape indicators and keypoint saliency by inputting 3D coordinates, thereby naturally combining 3D keypoint detection and shape reconstruction. SNAKE achieves superior performance on multiple public benchmark datasets, including object-level datasets (such as ModelNet40, KeypointNet, SMPL mesh) and scene-level datasets (such as 3DMatch and Redwood). Shape awareness brings the following advantages: 1. **Semantic Consistency**: The 3D keypoints generated by SNAKE are consistent with human semantic annotations, even without such supervision. 2. **Repeatability**: SNAKE outperforms other methods in terms of repeatability, especially when the input point cloud is downsampled. 3. **Geometric Registration**: The generated keypoints allow for accurate geometric registration, particularly in zero-shot settings. ### Main Contributions 1. **Proposed a New Network**: Based on implicit neural representation, jointly performing surface reconstruction and 3D keypoint detection. During training, various self-supervised loss functions were developed, leveraging the interrelationship between the two decoders. During testing, a gradient-based optimization strategy was designed to maximize keypoint saliency. 2. **Extensive Quantitative and Qualitative Evaluation**: Evaluated on object-level datasets (ModelNet40, KeypointNet, SMPL mesh) and scene-level datasets (3DMatch and Redwood), demonstrating SNAKE's state-of-the-art performance in semantic consistency, repeatability, and geometric registration. ### Related Work 1. **3D Keypoint Detectors**: 3D keypoint detection methods are mainly divided into handcrafted and learning-based methods. Traditional handcrafted methods are usually based on local geometric statistics but struggle to detect consistent keypoints under real-world disturbances. Modern learning-based methods rely on consistency under geometric transformations, but the generated keypoints lack semantic saliency. The recent UKPGAN method uses reconstruction to find semantically aware 3D keypoints, but it recovers explicit coordinates rather than implicit shape indicators. 2. **Implicit Neural Representation**: SNAKE leverages implicit neural representation to parameterize continuous 3D keypoint fields, inspired by research on neural radiance fields and neural distance fields. Unlike explicit 3D representations (such as point clouds, voxels, or meshes), implicit neural functions can continuously decode shapes and learn complex shape topologies. ### Method SNAKE is a shape-aware implicit network for 3D keypoint detection. SNAKE conditions two implicit decoders (one for shape, one for keypoint saliency) on shared volumetric feature embeddings. To encourage repeatable, uniformly distributed, and sparse keypoints, various self-supervised loss functions are employed, combining predicted surface occupancy and keypoint saliency. During inference, a gradient-based optimization method is used to further optimize query points with high saliency. ### Experiments 1. **Semantic Consistency**: Evaluated keypoint semantic consistency between different instances on the KeypointNet and SMPL datasets. SNAKE achieved higher mIoU at most thresholds, and the generated keypoints were well-aligned with human annotations. 2. **Repeatability**: Evaluated keypoint repeatability under different viewpoint point clouds on the ModelNet40 and Redwood datasets. SNAKE exhibited the highest repeatability in most cases, as the shape-aware strategy helps the model infer the underlying shape of objects/scenes, making keypoints robust to input variations.

SNAKE: Shape-aware Neural 3D Keypoint Field

KeypointDETR: an End-to-End 3D Keypoint Detector

SASAN: Shape-Adaptive Set Abstraction Network for Point-Voxel 3D Object Detection.

Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction.

KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features

3D Keypoint Detection Based on Deep Neural Network with Sparse Autoencoder

Shape registration with learned deformations for 3D shape reconstruction from sparse and incomplete point clouds

KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations

SK-Net: Deep Learning on Point Cloud via End-to-end Discovery of Spatial Keypoints

Unsupervised distribution-aware keypoints generation from 3D point clouds

LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints

KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection

Skeleton Merger: an Unsupervised Aligned Keypoint Detector

LifelongGlue: Keypoint Matching for 3D Reconstruction with Continual Neural Networks

Multi-Task Joint Learning of 3D Keypoint Saliency and Correspondence Estimation

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

SPBA-Net point cloud object detection with sparse attention and box aligning

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

Learning Based 3D Keypoint Detection with Local and Global Attributes in Multi-Scale Space.

Skeleton-Aware 3d Human Shape Reconstruction From Point Clouds