Abstract:Voxel-based 3D object detection has been widely applied in robotics, virtual reality, and autonomous driving. However, inefficiency in the voxelization and backbone-network computation, which are the main components of voxel-based models, prevents efficient 3D object detection. First, due to the high sparsity and irregularity of the point cloud, the voxelization process usually requires generalized platforms, such as CPUs, and causes low voxelization speed. Second, the voxel-based models contain considerable transposed convolutional layers, and existing accelerators introduce considerable additional hardware to support both convolution and transposed convolution operations. Nonetheless, this strategy incurs significant hardware costs. Besides, transposed convolutions result in various patterns of sparse feature maps, and pruning, as a representative model compression technique, results in sparse weight matrices. The two types of sparsity impose challenges in accelerating the voxel-based models, including activation-weight matching efficiency, low partial-sum accumulation efficiency, and workload imbalance issues. In this work, we propose DSAV, a 3D object detection accelerator to address these obstacles. Specifically, we first propose a hash-based voxelizer for efficient voxelization, by storing and indexing voxels hierarchically. Then, we collaboratively design the transposed convolution acceleration method, structured pruning method, and accelerator architecture for the voxel-based models. As a result, the accelerator can fully leverage the sparsity lies in both feature maps and weight matrices. Experimental results show that the proposed accelerator can outperform prior studies by 19× ~ 19.8× faster in voxelization, and 4.29× ~38.01× faster in backbone inference. Finally, the accelerator achieves 4.61× ~ 31.63× speedups than its counterparts in 3D object detection tasks.

fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence

fVDB : A Deep-Learning Framework for Sparse, Large Scale, and High Performance Spatial Intelligence

VDBblox: Accurate and Efficient Distance Fields for Path Planning and Mesh Reconstruction

Depth-Box VDB: Accelerate Sparse Volume Rendering with Depth Maps Through Voxel Database

A Framework for the Volumetric Integration of Depth Images

NeuralVDB: High-resolution Sparse Volume Representation using Hierarchical Neural Networks

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

VPFusion: Towards Robust Vertical Representation Learning for 3D Object Detection.

Chunkfusion: A Learning-Based RGB-D 3D Reconstruction Framework Via Chunk-Wise Integration

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

VDBFusion: Flexible and Efficient TSDF Integration of Range Sensor Data

DSAV: A Deep Sparse Acceleration Framework for Voxel-Based 3D Object Detection

FG-Net: A Fast and Accurate Framework for Large-Scale LiDAR Point Cloud Understanding

VDB-GPDF: Online Gaussian Process Distance Field with VDB Structure

SDVRF: Sparse-to-Dense Voxel Region Fusion for Multi-modal 3D Object Detection

Hierarchical, Dense and Dynamic 3D Reconstruction Based on VDB Data Structure for Robotic Manipulation Tasks

DVFENet: Dual-branch voxel feature extraction network for 3D object detection

FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels

VoxelFSD: voxel-based fully sparse detector with sparse convolution for 3D object detection

Point-Voxel CNN for Efficient 3D Deep Learning