Abstract:Given the prominence of 3-D sensors in recent years, 3-D point clouds are worthy to be further investigated for environment perception and scene understanding. Learning accurate local and global contexts in point clouds is pivotal for semantic segmentation, and neighbor aggregation (NA) and transformers have achieved notable success in local and global perception in point cloud analysis, respectively. Nevertheless, studying each independently is far from the optimal solution for comprehensive feature learning. To address this, we take a novel step toward investigating and integrating the structures of NA and transformers. In this article, we introduce Point Neighbor Aggregation with Transformer (PointNAT), a conceptually straightforward and effective approach aiming to enhance the performance of 3-D point cloud semantic segmentation. PointNAT consists of an NA block (NAB) for local perception, a point transformer block (PTB) for global modeling, and a hybrid block to connect NABs and PTBs. NABs effectively learn complex local features at varying scales through an improved NA operation and a multihead mechanism. PTBs efficiently perform global attention using a small set of learnable key points. Hybrid blocks serve as high-and-low frequency signal hybridizers, merging the strengths of these two blocks by adaptively assigning hybrid weights to local and global contexts. We have evaluated the performance of PointNAT with state-of-the-art networks on several benchmarks, including Stanford Large-Scale 3-D Indoor Spaces (S3DIS), Toronto3D, and SensatUrban. PointNAT achieves mean intersection over union (mIoU) scores of 77.8%, 84.7%, and 65.2% in these three datasets. Furthermore, it outperforms the baseline approach PointNeXt by 3.0%, 1.3%, and 4.2% while utilizing only 59.9% of the parameters and 15.2% of the floating-point operations (FLOPs). The results demonstrate PointNAT's superior ability in accurately segmenting large-scale 3-D point cloud scenes, emphasizing its potential to advance environment perception and scene understanding. Our code is available at https://github.com/zeng-ziyin/PointNAT.

Fine-Tuning Point Cloud Transformers with Dynamic Aggregation

Aggregating Feature Point Cloud for Depth Completion

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

Parameter Efficient Point Cloud Prompt Tuning for Unified Point Cloud Understanding

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

Point cloud upsampling via a coarse-to-fine network with transformer-encoder

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Dynamic Local Feature Aggregation for Learning on Point Clouds

Dynamic clustering transformer network for point cloud segmentation

Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers

Rendering-Oriented 3D Point Cloud Attribute Compression using Sparse Tensor-based Transformer

Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

DCNet: Large-Scale Point Cloud Semantic Segmentation with Discriminative and Efficient Feature Aggregation

PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

AdaPoinTr: Diverse Point Cloud Completion With Adaptive Geometry-Aware Transformers

Deep Interactive Full Transformer Framework for Point Cloud Registration.

PointNAT: Large-Scale Point Cloud Semantic Segmentation via Neighbor Aggregation With Transformer

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

Full Transformer Framework for Robust Point Cloud Registration With Deep Information Interaction

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders