DSAV: A Deep Sparse Acceleration Framework for Voxel-Based 3D Object Detection
Haining Fang,Yujuan Tan,Ao Ren,Wei Zhuang,Yang Hua,Zhiyong Qin,Duo Liu
DOI: https://doi.org/10.1109/TCAD.2024.3437334
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Voxel-based 3D object detection has been widely applied in robotics, virtual reality, and autonomous driving. However, inefficiency in the voxelization and backbone-network computation, which are the main components of voxel-based models, prevents efficient 3D object detection. First, due to the high sparsity and irregularity of the point cloud, the voxelization process usually requires generalized platforms, such as CPUs, and causes low voxelization speed. Second, the voxel-based models contain considerable transposed convolutional layers, and existing accelerators introduce considerable additional hardware to support both convolution and transposed convolution operations. Nonetheless, this strategy incurs significant hardware costs. Besides, transposed convolutions result in various patterns of sparse feature maps, and pruning, as a representative model compression technique, results in sparse weight matrices. The two types of sparsity impose challenges in accelerating the voxel-based models, including activation-weight matching efficiency, low partial-sum accumulation efficiency, and workload imbalance issues. In this work, we propose DSAV, a 3D object detection accelerator to address these obstacles. Specifically, we first propose a hash-based voxelizer for efficient voxelization, by storing and indexing voxels hierarchically. Then, we collaboratively design the transposed convolution acceleration method, structured pruning method, and accelerator architecture for the voxel-based models. As a result, the accelerator can fully leverage the sparsity lies in both feature maps and weight matrices. Experimental results show that the proposed accelerator can outperform prior studies by 19× ~ 19.8× faster in voxelization, and 4.29× ~38.01× faster in backbone inference. Finally, the accelerator achieves 4.61× ~ 31.63× speedups than its counterparts in 3D object detection tasks.