Tao Zhang,Haobo Yuan,Lu Qi,Jiangning Zhang,Qianyu Zhou,Shunping Ji,Shuicheng Yan,Xiangtai Li
Abstract:Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of \textit{x}, \textit{y}, and \textit{z} coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.
What problem does this paper attempt to address?
The main problem this paper attempts to address is improving the global modeling capability of point cloud data while maintaining linear computational complexity. Specifically, the paper introduces a new architecture based on the State Space Model (SSM) called Point Cloud Mamba (PCM) to handle point cloud data more efficiently and effectively.
### Main Problems
1. **Global Modeling Capability**: Existing point cloud processing methods have shortcomings in global modeling, especially when dealing with large-scale point cloud data. Traditional point cloud methods like PointNet and PointNet++ mainly rely on local perception, while Transformer-based methods, although having global perception capabilities, have high computational complexity (O(N^2)).
2. **Computational Efficiency**: How to reduce computational complexity to linear complexity (O(N)) while maintaining global modeling capability.
### Solutions
1. **Mamba Architecture**: The paper adopts the Mamba architecture, a method based on the State Space Model, capable of global modeling with linear complexity. The Mamba architecture has already proven its effectiveness in natural language processing tasks.
2. **Consistent Traverse Serialization (CTS)**: To convert 3D point cloud data into a 1D sequence, the paper proposes a new serialization method called Consistent Traverse Serialization (CTS). CTS ensures that adjacent points in the sequence are also adjacent in space through grid sampling and sorting.
3. **Order Prompts**: To help the Mamba layer better handle point sequences in different orders, the paper introduces the Order Prompts mechanism. These prompts inform Mamba of the arrangement rules of the current point sequence, thereby enhancing its processing capability.
4. **Positional Encoding**: To inject positional information, the paper proposes a positional encoding method based on spatial coordinate mapping. This method is more suitable for handling sparse and irregular point cloud data compared to traditional RoPE and learnable positional encodings.
### Experimental Results
1. **3D Object Classification**: On the ScanObjectNN and ModelNet40 datasets, PCM significantly outperformed existing point cloud methods, including PointNeXt and PTv3.
2. **Part Segmentation**: PCM also achieved excellent performance on the ShapeNetPart dataset.
3. **Semantic Segmentation**: On the S3DIS dataset, PCM-Tiny achieved 79.6 mIoU, significantly surpassing previous SOTA models DeLA and PTv3.
### Contributions
1. **Introduction of Mamba Architecture**: For the first time, the Mamba architecture is applied to point cloud analysis, constructing a framework that combines local and global modeling.
2. **Proposed New Serialization Methods**: The paper proposes Consistent Traverse Serialization, Order Prompts, and a spatial coordinate mapping-based positional encoding method, improving Mamba's performance in handling point cloud data.
3. **Experimental Proof**: Through experiments on multiple datasets, the paper demonstrates PCM's superior performance in 3D object classification, part segmentation, and semantic segmentation tasks.
In summary, this paper successfully addresses the efficient global modeling problem of point cloud data by introducing the Mamba architecture and a series of innovative techniques, achieving significant performance improvements on multiple benchmark datasets.