Spatial Transformer for 3D Point Clouds

Jiayun Wang,Rudrasis Chakraborty,Stella X. Yu

DOI: https://doi.org/10.1109/TPAMI.2021.3070341

2021-03-30

Abstract:Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordinates. This common practice is easy to implement but not necessarily optimal. Ideally, local neighborhoods should be different at different layers, as more latent information is extracted at deeper layers. We propose a novel end-to-end approach to learn different non-rigid transformations of the input point cloud so that optimal local neighborhoods can be adopted at each layer. We propose both linear (affine) and non-linear (projective and deformable) spatial transformers for 3D point clouds. With spatial transformers on the ShapeNet part segmentation dataset, the network achieves higher accuracy for all categories, with 8\% gain on earphones and rockets in particular. Our method also outperforms the state-of-the-art on other point cloud tasks such as classification, detection, and semantic segmentation. Visualizations show that spatial transformers can learn features more efficiently by dynamically altering local neighborhoods according to the geometry and semantics of 3D shapes in spite of their within-category variations. Our code is publicly available at <a class="link-external link-https" href="https://github.com/samaonline/spatial-transformer-for-3d-point-clouds" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when dealing with 3D point - cloud data, existing methods usually adopt a fixed local neighborhood definition, which may not be optimal. Specifically, most methods define the local neighborhood according to the Euclidean distance of the input 3D point coordinates. Although this approach is simple and easy to implement, it may not be able to efficiently encode the semantic information of 3D shapes, and the fixed local neighborhood may limit the learning ability of the model, because different layers capture information at different levels of abstraction. For example, objects have a natural hierarchical structure. In order to segment their parts, it is more efficient to provide different layers with the ability to resolve these parts at different spatial scales. To solve the above problems, the paper proposes a new end - to - end method. By learning different non - rigid transformations of the input point cloud, each network layer can adopt the optimal local neighborhood. Specifically, the authors propose linear (affine) and nonlinear (projection and deformation) spatial transformers for 3D point clouds. These transformers allow the network to adaptively learn point features covering different spatial ranges, thereby obtaining dynamic local neighborhoods at different network depth layers. This method not only improves the learning efficiency of the model when dealing with objects with large spatial variations, but also achieves better performance than existing methods in multiple 3D point - cloud tasks, such as classification, detection, and semantic segmentation.

Spatial Transformer for 3D Point Clouds

Stratified Transformer for 3D Point Cloud Segmentation

Hierarchical Spatial Transformer Network

A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space

Dynamic clustering transformer network for point cloud segmentation

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

Spatial deformable transformer for 3D point cloud registration

PointNAT: Large-Scale Point Cloud Semantic Segmentation via Neighbor Aggregation With Transformer

APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding

GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation

Group-in-Group Relation-Based Transformer for 3D Point Cloud Learning

RS-TNet: point cloud transformer with relation-shape awareness for fine-grained 3D visual processing

OctFormer: Octree-based Transformers for 3D Point Clouds

Collect-and-Distribute Transformer for 3D Point Cloud Analysis

Local Transformer Network on 3D Point Cloud Semantic Segmentation

GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding

D2T-Net: A dual-domain transformer network exploiting spatial and channel dimensions for semantic segmentation of urban mobile laser scanning point clouds

Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation

Gsformer: geometric-spatial transformer on point cloud completion

PointMTL: Multi-Transform Learning for Effective 3D Point Cloud Representations