Abstract:Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of \textit{x}, \textit{y}, and \textit{z} coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.

What problem does this paper attempt to address?

The main problem this paper attempts to address is improving the global modeling capability of point cloud data while maintaining linear computational complexity. Specifically, the paper introduces a new architecture based on the State Space Model (SSM) called Point Cloud Mamba (PCM) to handle point cloud data more efficiently and effectively. ### Main Problems 1. **Global Modeling Capability**: Existing point cloud processing methods have shortcomings in global modeling, especially when dealing with large-scale point cloud data. Traditional point cloud methods like PointNet and PointNet++ mainly rely on local perception, while Transformer-based methods, although having global perception capabilities, have high computational complexity (O(N^2)). 2. **Computational Efficiency**: How to reduce computational complexity to linear complexity (O(N)) while maintaining global modeling capability. ### Solutions 1. **Mamba Architecture**: The paper adopts the Mamba architecture, a method based on the State Space Model, capable of global modeling with linear complexity. The Mamba architecture has already proven its effectiveness in natural language processing tasks. 2. **Consistent Traverse Serialization (CTS)**: To convert 3D point cloud data into a 1D sequence, the paper proposes a new serialization method called Consistent Traverse Serialization (CTS). CTS ensures that adjacent points in the sequence are also adjacent in space through grid sampling and sorting. 3. **Order Prompts**: To help the Mamba layer better handle point sequences in different orders, the paper introduces the Order Prompts mechanism. These prompts inform Mamba of the arrangement rules of the current point sequence, thereby enhancing its processing capability. 4. **Positional Encoding**: To inject positional information, the paper proposes a positional encoding method based on spatial coordinate mapping. This method is more suitable for handling sparse and irregular point cloud data compared to traditional RoPE and learnable positional encodings. ### Experimental Results 1. **3D Object Classification**: On the ScanObjectNN and ModelNet40 datasets, PCM significantly outperformed existing point cloud methods, including PointNeXt and PTv3. 2. **Part Segmentation**: PCM also achieved excellent performance on the ShapeNetPart dataset. 3. **Semantic Segmentation**: On the S3DIS dataset, PCM-Tiny achieved 79.6 mIoU, significantly surpassing previous SOTA models DeLA and PTv3. ### Contributions 1. **Introduction of Mamba Architecture**: For the first time, the Mamba architecture is applied to point cloud analysis, constructing a framework that combines local and global modeling. 2. **Proposed New Serialization Methods**: The paper proposes Consistent Traverse Serialization, Order Prompts, and a spatial coordinate mapping-based positional encoding method, improving Mamba's performance in handling point cloud data. 3. **Experimental Proof**: Through experiments on multiple datasets, the paper demonstrates PCM's superior performance in 3D object classification, part segmentation, and semantic segmentation tasks. In summary, this paper successfully addresses the efficient global modeling problem of point cloud data by introducing the Mamba architecture and a series of innovative techniques, achieving significant performance improvements on multiple benchmark datasets.

Point Cloud Mamba: Point Cloud Learning via State Space Model

PointMamba: A Simple State Space Model for Point Cloud Analysis

Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model

Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy

Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

Exploring contextual modeling with linear complexity for point cloud segmentation

PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis

3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs

PointABM:Integrating Bidirectional State Space Model with Multi-Head Self-Attention for Point Cloud Analysis

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models

MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering

OccMamba: Semantic Occupancy Prediction with State Space Models

Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering

PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling

AdaPoinTr: Diverse Point Cloud Completion With Adaptive Geometry-Aware Transformers

Multi Point-Voxel Convolution (MPVConv) for Deep Learning on Point Clouds

PointMTL: Multi-Transform Learning for Effective 3D Point Cloud Representations