OctFormer: Octree-based Transformers for 3D Point Clouds

Peng-Shuai Wang

DOI: https://doi.org/10.1145/3592131

2023-05-08

Abstract:We propose octree-based transformers, named OctFormer, for 3D point cloud learning. OctFormer can not only serve as a general and effective backbone for 3D point cloud segmentation and object detection but also have linear complexity and is scalable for large-scale point clouds. The key challenge in applying transformers to point clouds is reducing the quadratic, thus overwhelming, computation complexity of attentions. To combat this issue, several works divide point clouds into non-overlapping windows and constrain attentions in each local window. However, the point number in each window varies greatly, impeding the efficient execution on GPU. Observing that attentions are robust to the shapes of local windows, we propose a novel octree attention, which leverages sorted shuffled keys of octrees to partition point clouds into local windows containing a fixed number of points while permitting shapes of windows to change freely. And we also introduce dilated octree attention to expand the receptive field further. Our octree attention can be implemented in 10 lines of code with open-sourced libraries and runs 17 times faster than other point cloud attentions when the point number exceeds 200k. Built upon the octree attention, OctFormer can be easily scaled up and achieves state-of-the-art performances on a series of 3D segmentation and detection benchmarks, surpassing previous sparse-voxel-based CNNs and point cloud transformers in terms of both efficiency and effectiveness. Notably, on the challenging ScanNet200 dataset, OctFormer outperforms sparse-voxel-based CNNs by 7.3 in mIoU. Our code and trained models are available at <a class="link-external link-https" href="https://wang-ps.github.io/octformer" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

The paper proposes a new method called OctFormer for handling 3D point cloud data. The current issue is that when applying attention mechanisms to point clouds, the computational complexity is quadratic, resulting in low efficiency. To address this problem, OctFormer adopts an attention mechanism based on an Octree, which divides the point cloud into local windows containing a fixed number of points while allowing the window shape to vary, thus maintaining linear complexity and improving efficiency. The paper mentions that existing methods such as window attention suffer from significant differences in the number of points in different windows, leading to decreased computational efficiency. OctFormer sorts and groups the point cloud using an Octree structure to ensure an equal number of points in each window, simplifying implementation and requiring only 10 lines of code using standard libraries. Additionally, they introduce expanded Octree attention to enlarge the receptive field. Experimental results demonstrate that OctFormer achieves the best performance in 3D segmentation and detection benchmark tests, particularly on the ScanNet200 dataset, where its mIoU surpasses CNN-based sparse voxel and point cloud transformers. OctFormer is not only an effective backbone network for point cloud learning but also easily scalable for handling large-scale point cloud data.

OctFormer: Octree-based Transformers for 3D Point Clouds

OctFormer: Octree-based Transformers for 3D Point Clouds

OctFormer: Efficient Octree-Based Transformer for Point Cloud Compression with Local Enhancement

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Collect-and-Distribute Transformer for 3D Point Cloud Analysis

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

PVT: Point-Voxel Transformer for Point Cloud Learning

Stratified Transformer for 3D Point Cloud Segmentation

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Point Transformer V3: Simpler, Faster, Stronger

OcTr: Octree-based Transformer for 3D Object Detection

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers

Fast and Robust Point Cloud Registration with Tree-based Transformer

Spatial Transformer for 3D Point Clouds

VTPNet for 3D deep learning on point cloud