Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

Guowen Zhang,Lue Fan,Chenhang He,Zhen Lei,Zhaoxiang Zhang,Lei Zhang

2024-06-19

Abstract:Serialization-based methods, which serialize the 3D voxels and group them into multiple sequences before inputting to Transformers, have demonstrated their effectiveness in 3D object detection. However, serializing 3D voxels into 1D sequences will inevitably sacrifice the voxel spatial proximity. Such an issue is hard to be addressed by enlarging the group size with existing serialization-based methods due to the quadratic complexity of Transformers with feature sizes. Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence. The linear complexity of SSMs encourages our group-free design, alleviating the loss of spatial proximity of voxels. To further enhance the spatial proximity, we propose a Dual-scale SSM Block to establish a hierarchical structure, enabling a larger receptive field in the 1D serialization curve, as well as more complete local regions in 3D space. Moreover, we implicitly apply window partition under the group-free framework by positional encoding, which further enhances spatial proximity by encoding voxel positional information. Our experiments on Waymo Open Dataset and nuScenes dataset show that Voxel Mamba not only achieves higher accuracy than state-of-the-art methods, but also demonstrates significant advantages in computational efficiency.

Computer Vision and Pattern Recognition,Robotics

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that in point - cloud - based 3D object detection, existing methods inevitably sacrifice the spatial proximity of voxels when serializing 3D voxels into 1D sequences. Specifically: 1. **Limitations of Existing Methods**: - Serialization methods (such as window segmentation, Z - shaped sorting, Hilbert sorting, etc.) are effective but will destroy the spatial proximity between voxels. - Due to the quadratic complexity of Transformer, increasing the group size cannot effectively solve this problem and will instead lead to a waste of computing resources. 2. **Proposed New Method**: - The paper introduces a new method based on the state - space model (SSM), called Voxel Mamba, which adopts an ungrouped strategy to serialize the entire voxel space into a single sequence. - Through the SSM with linear complexity, Voxel Mamba can process voxels more effectively and avoid the loss of spatial proximity caused by grouping in traditional methods. 3. **Improvement Measures**: - The dual - scale SSM block (DSB) is proposed to establish a hierarchical structure, expand the effective receptive field of the sequence, and enhance the spatial proximity of the local 3D region. - The implicit window partition (IWP) is introduced to enhance the spatial proximity of voxels through position encoding without the need for explicit window partitioning. 4. **Experimental Results**: - Experiments on the Waymo Open Dataset and nuScenes dataset show that Voxel Mamba is not only superior to existing methods in terms of accuracy but also has significant advantages in computational efficiency. In summary, this paper aims to solve the problem of voxel spatial proximity loss in existing serialization - based 3D object detection methods and provides a more efficient and accurate solution by introducing Voxel Mamba.

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

MT-SSD: Single-Stage 3D Object Detector Based on Magnification Transformation

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

PointMamba: A Simple State Space Model for Point Cloud Analysis

MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection

Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy

Point Cloud Mamba: Point Cloud Learning via State Space Model

Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model

DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization

Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification

Multi-Source Features Fusion Single Stage 3D Object Detection with Transformer.

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs

Scalable Visual State Space Model with Fractal Scanning

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model

DS-Trans: A 3D Object Detection Method Based on a Deformable Spatiotemporal Transformer for Autonomous Vehicles

VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection

VSSD: Vision Mamba with Non-Causal State Space Duality