pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

Purab Ranjan Sutradhar,Mark Connolly,Sathwika Bavikadi,Sai Manoj Pudukotai Dinakarrao,Mark A. Indovina,Amlan Ganguly
DOI: https://doi.org/10.1109/lca.2020.3011643
IF: 2.3
2020-07-01
IEEE Computer Architecture Letters
Abstract:Memory access latencies and low data transfer bandwidth limit the processing speed of many data intensive applications such as Convolutional Neural Networks (CNNs) in conventional Von Neumann architectures. Processing in Memory (PIM) is envisioned as a potential hardware solution for such applications as the data access bottlenecks can be avoided in PIM by performing computations within the memory die. However, PIM realizations with logic-based complex processing units within the memory present complicated fabrication challenges. In this letter, we propose to leverage the existing memory infrastructure to implement a programmable PIM (pPIM), a novel Look-Up-Table (LUT)-based PIM where all the processing units are implemented solely with LUTs, as opposed to prior LUT-based PIM implementations that combine LUT with logic circuitry for computations. This enables pPIM to perform ultra-low power & low-latency operations with minimal fabrication complications. Moreover, the complete LUT-based design offers simple 'memory write' based programmability in pPIM. Enabling precision scaling further improves the performance and the power consumption for CNN applications. The programmability feature potentially makes it easier for online training implementations. Our preliminary simulations demonstrate that our proposed pPIM can achieve 2000x, 657.5x and 1.46x improvement in inference throughput per unit power consumption compared to state-of-the-art conventional processor architecture, Graphics Processing Unit (GPUs) and a prior hybrid LUT-logic based PIM respectively. Furthermore, precision scaling improves the energy efficiency of the pPIM approximately by 1.35x over its full-precision operation.
computer science, hardware & architecture
What problem does this paper attempt to address?