Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks

Xipeng Lin,Shanshi Huang,Hongwu Jiang
2024-09-28
Abstract:The 3D point cloud perception has emerged as a fundamental role for a wide range of applications. In particular, with the rapid development of neural networks, the voxel-based networks attract great attention due to their excellent performance. Various accelerator designs have been proposed to improve the hardware performance of voxel-based networks, especially to speed up the map search process. However, several challenges still exist including: (1) massive off-chip data access volume caused by map search operations, notably for high resolution and dense distribution cases, (2) frequent data movement for data-intensive convolution operations, (3) imbalanced workload caused by irregular sparsity of point data. To address the above challenges, we propose Voxel-CIM, an efficient Compute-in-Memory based accelerator for voxel-based neural network processing. To reduce off-chip memory access for map search, a depth-encoding-based output major search approach is introduced to maximize data reuse, achieving stable $O(N)$-level data access volume in various situations. Voxel-CIM also employs the in-memory computing paradigm and designs innovative weight mapping strategies to efficiently process Sparse 3D convolutions and 2D convolutions. Implemented on 22 nm technology and evaluated on representative benchmarks, the Voxel-CIM achieves averagely 4.5~7.0$\times$ higher energy efficiency (10.8 TOPS/w), and 2.4~5.4$\times$ speed up in detection task and 1.2~8.1$\times$ speed up in segmentation task compared to the state-of-the-art point cloud accelerators and powerful GPUs.
Hardware Architecture
What problem does this paper attempt to address?
The main problem this paper attempts to address is improving the performance of voxel-based point cloud neural networks on hardware accelerators. Specifically, the paper proposes solutions to the following three major challenges: 1. **Large off-chip data access**: - **Mapping operations lead to large off-chip data access**: Before performing sparse convolution, it is necessary to construct input-output mapping tables (IN-OUT maps), which usually results in a large amount of off-chip data access, especially in high-resolution and densely distributed scenarios. - **Frequent data transfers**: In traditional von Neumann architecture, due to the "memory wall" problem, the large amount of data movement between computing units and storage units limits the processing speed of neural networks. 2. **Unbalanced workload**: - **Irregular sparsity of point cloud data**: Due to the randomness and uneven distribution of point cloud data, each weight corresponds to a different number of input-output pairs, leading to an unbalanced computational workload. The workload of central weights is usually higher, while the workload of edge weights is lower, resulting in low utilization of computational resources. To address these challenges, the paper proposes Voxel-CIM, an efficient accelerator based on Compute-in-Memory (CIM). Its main contributions include: - **Reducing off-chip data access**: Introducing a new search scheme called Depth-encoding-based Output Major Search (DOMS), which achieves stable \(O(N)\) level off-chip memory access by maximizing data reuse. - **Designing CIM processing units and their weight mapping strategy**: Supporting efficient sparse 3D convolution (Spconv3D) and 2D convolution (Conv2D) computations, and proposing a Weight Workload Balanced (W2B) method to address workload mismatch issues. - **Performance evaluation**: Conducting comprehensive performance evaluations on detection and segmentation benchmarks, showing that Voxel-CIM improves energy efficiency by an average of 4.5~7.0 times (10.8 TOPS/W) compared to state-of-the-art point cloud accelerators and powerful GPUs, accelerates detection tasks by 2.4~5.4 times, and accelerates segmentation tasks by 1.2~8.1 times. In summary, this paper effectively addresses the key issues of hardware acceleration for voxel-based point cloud neural networks through innovative search methods and CIM architecture, significantly improving performance and energy efficiency.