ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration

Orian Leitersdorf,Ronny Ronen,Shahar Kvatinsky
DOI: https://doi.org/10.48550/arXiv.2305.04122
2023-05-07
Abstract:Processing-in-memory (PIM) architectures are emerging to reduce data movement in data-intensive applications. These architectures seek to exploit the same physical devices for both information storage and logic, thereby dwarfing the required data transfer and utilizing the full internal memory bandwidth. Whereas analog PIM utilizes the inherent connectivity of crossbar arrays for approximate matrix-vector multiplication in the analog domain, digital PIM architectures enable bitwise logic operations with massive parallelism across columns of data within memory arrays. Several recent works have extended the computational capabilities of digital PIM architectures towards the full-precision (single-precision floating-point) acceleration of convolutional neural networks (CNNs); yet, they lack a comprehensive comparison to GPUs. In this paper, we examine the potential of digital PIM for CNN acceleration through an updated quantitative comparison with GPUs, supplemented with an analysis of the overall limitations of digital PIM. We begin by investigating the different PIM architectures from a theoretical perspective to understand the underlying performance limitations and improvements compared to state-of-the-art hardware. We then uncover the tradeoffs between the different strategies through a series of benchmarks ranging from memory-bound vectored arithmetic to CNN acceleration. We conclude with insights into the general performance of digital PIM architectures for different data-intensive applications.
Hardware Architecture
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance and limitations of the Digital Processing - in - Memory (Digital PIM) architecture compared with the current state - of - the - art hardware (such as GPU) in the acceleration of Convolutional Neural Networks (CNN). Specifically, the paper aims to: 1. **Evaluate the potential of digital PIM in CNN acceleration**: Through an updated quantitative comparison, explore the performance of digital PIM in CNN acceleration and compare it with GPU. 2. **Analyze the limitations of digital PIM**: Analyze the performance limitations and improvement points of different PIM architectures from both theoretical and experimental perspectives, especially the advantages and disadvantages compared with existing hardware. 3. **Provide a comprehensive performance evaluation**: Through a series of benchmark tests, comprehensively evaluate the performance of digital PIM from basic vector arithmetic operations to the inference and training of large - scale CNN models. ### Main research contents - **Theoretical analysis**: Theoretically explore the performance limitations and improvements of different PIM architectures and compare them with existing hardware (such as GPU). - **Experimental verification**: Through a series of benchmark tests, including memory - intensive vector arithmetic operations, matrix multiplication, 2D convolution, and complete CNN inference and training, verify the actual performance of digital PIM. - **Performance indicators**: Develop multiple performance indicators to further understand the performance of the digital PIM architecture in different data - intensive applications. ### Key findings - **High computational complexity**: Digital PIM has a high computational complexity in floating - point operations, resulting in limited performance improvement in some tasks. - **High data reuse rate**: The high data reuse rate in the CNN architecture makes GPU perform well in these tasks, while the advantage of digital PIM is not obvious. - **Memory wall bottleneck**: In tasks with a low data reuse rate, digital PIM can significantly reduce memory access latency, but in tasks with a high data reuse rate, this advantage is weakened. ### Conclusion Although digital PIM performs well in some specific tasks, in terms of full - precision CNN acceleration, the digital PIM architecture under the current parameters still cannot surpass the performance of GPU. Future research can focus on applications that require low computational complexity or low data reuse rate to fully utilize the advantages of digital PIM.