Pamba: Enhancing Global Interaction in Point Clouds via State Space Model

Zhuoyuan Li,Yubo Ai,Jiahao Lu,ChuXin Wang,Jiacheng Deng,Hanzhi Chang,Yanzhe Liang,Wenfei Yang,Shifeng Zhang,Tianzhu Zhang
2025-01-05
Abstract:Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation costs high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies between objects in a single scene. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, we introduce Mamba, an SSM-based architecture, to the point cloud domain and propose Pamba, a novel architecture with strong global modeling capability under linear complexity. Specifically, to make the disorderness of point clouds fit in with the causal nature of Mamba, we propose a multi-path serialization strategy applicable to point clouds. Besides, we propose the ConvMamba block to compensate for the shortcomings of Mamba in modeling local geometries and in unidirectional modeling. Pamba obtains state-of-the-art results on several 3D point cloud segmentation tasks, including ScanNet v2, ScanNet200, S3DIS and nuScenes, while its effectiveness is validated by extensive experiments.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the high computational complexity and the difficulty in capturing long - distance dependencies faced by existing Transformer - based 3D point cloud semantic segmentation methods when processing large - scale point cloud data. Specifically: 1. **High computational cost**: Since the time complexity of Transformer is quadratic ($O(n^2)$), it can only handle a limited number of points simultaneously, which restricts its ability to model long - range dependencies between different objects in a single scene. 2. **Difficulty in capturing long - range dependencies**: In large - scale point cloud data, long - range dependencies between different objects are crucial for accurate semantic segmentation. However, existing methods are insufficient in this regard. To solve these problems, the author introduced a new architecture - Pamba, which is based on the state - space model (SSM) and has been improved in the point cloud field. Through the state - space model with linear complexity, Pamba can effectively capture long - range dependencies while processing large - scale point clouds. In addition, in order to adapt to the disorderliness and local geometric characteristics of point clouds, Pamba also proposed a multi - path serialization strategy and the ConvMamba module to enhance global and local modeling capabilities. ### Specific problems and solutions - **Disorderliness of point clouds**: Point cloud data is usually disordered, while Mamba is designed for causal sequences and is sensitive to the input order. For this reason, Pamba proposed a multi - path serialization strategy, by rearranging the point clouds, so that adjacent points also maintain spatial proximity in the sequence. - **Insufficient local geometric modeling**: Mamba sacrifices the quality of local geometric features in the process of compressing all contexts into a specific state. Pamba makes up for this deficiency by introducing the ConvMamba module combined with convolution operations, while capturing long - distance dependencies and local geometric features. - **One - way modeling problem**: Mamba only performs one - way modeling, that is, a point can only interact with previous points. Pamba enhances the two - way interaction ability by introducing a two - way Mamba mechanism, so that each point can interact with points on both sides. Through these improvements, Pamba has achieved state - of - the - art performance in multiple 3D point cloud segmentation tasks and has demonstrated strong generalization ability and efficient computational performance. ### Summary The main contributions of Pamba are: - Proposing a new framework, Pamba, for 3D point cloud semantic segmentation, which can capture long - range dependencies with linear complexity. - Proposing a multi - path serialization strategy and the ConvMamba module to help Mamba better adapt to point cloud data. - Verifying the effectiveness of design choices through extensive experiments, and Pamba has achieved the best performance in multiple challenging point cloud segmentation tasks.