14.2 A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration
Jingcheng Wang,Xiaowei Wang,Charles Eckert,Arun Subramaniyan,Reetuparna Das,David Blaauw,Dennis Sylvester
DOI: https://doi.org/10.1109/isscc.2019.8662419
2019-01-01
Abstract:Data movement and memory bandwidth are dominant factors in the energy and performance of both general purpose CPUs and GPUs. This has led to extensive research focused on in-memory computing, which moves computation to where the data is located. With this approach, computation is often performed on the memory bit-lines in the analog domain using current summing [1]–[3], which requires expensive analog-to-digital and digital-to-analog conversions at the array boundary. In addition, such analog computation is very sensitive to PVT variations, limiting precision. More recently, full-rail (digital) binary in-memory computing was proposed to avoid this conversion overhead and improve robustness [4], [5]. However, both prior in-memory approaches suffer from the same major limitations: they accelerate only one type of algorithm and are inherently restricted to a very specific application domain due to their limited and fixed bit-width precision and non-programmable architecture. Software algorithms, on the other hand, continue to evolve rapidly, especially in novel application domains, such as neural networks, vision and graph processing, making rigid accelerators of limited use. Furthermore, most available SRAM in today’s chips is located in the caches of CPUs or GPUs. These large CPU and GPU SRAM stores present an opportunity for extensive in memory computing and have, to date, remained largely untapped.