A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices

Cheng-Xin Xue,Yen-Cheng Chiu,Ta-Wei Liu,Tsung-Yuan Huang,Je-Syu Liu,Ting-Wei Chang,Hui-Yao Kao,Jing-Hong Wang,Shih-Ying Wei,Chun-Ying Lee,Sheng-Po Huang,Je-Min Hung,Shih-Hsih Teng,Wei-Chen Wei,Yi-Ren Chen,Tzu-Hsiang Hsu,Yen-Kai Chen,Yun-Chen Lo,Tai-Hsing Wen,Chung-Chuan Lo,Ren-Shuo Liu,Chih-Cheng Hsieh,Kea-Tiong Tang,Mon-Shu Ho,Chin-Yi Su,Chung-Cheng Chou,Yu-Der Chih,Meng-Fan Chang
DOI: https://doi.org/10.1038/s41928-020-00505-5
IF: 33.255
2020-12-14
Nature Electronics
Abstract:<p>Nature Electronics, Published online: 14 December 2020; <a href="https://www.nature.com/articles/s41928-020-00505-5">doi:10.1038/s41928-020-00505-5</a></p>Commercial complementary metal–oxide–semiconductor and resistive random-access memory technologies can be used to create multibit compute-in-memory circuits capable of fast and energy-efficient inference for use in small artificial intelligence edge devices.
engineering, electrical & electronic
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving efficient, low-power artificial intelligence (AI) computation on edge devices. Specifically, the paper tackles a series of challenges faced by non-volatile compute-in-memory (nvCIM) architectures when performing dot product operations, including: 1. **Precision of input-weight-output configuration**: Existing nvCIM architectures suffer from insufficient precision when handling multi-bit inputs, weights, and outputs, which limits the complexity and inference accuracy of neural networks. 2. **Performance bottleneck**: Data transfer in traditional von Neumann architectures leads to high latency and high energy consumption, forming the so-called "memory wall" bottleneck. 3. **Parallel input and cell area limitations**: Large-scale parallel input and high-precision weight storage require more cell area, increasing design complexity and energy consumption. 4. **Signal margin degradation**: Current leakage in high resistance state (HRS) cells leads to a decrease in signal margin, affecting computational accuracy. 5. **Delay and energy consumption of multi-bit analog readout operations**: High-precision analog-to-digital conversion requires longer delays and higher energy consumption. To overcome these challenges, the paper proposes a 2 Mb fully complementary metal-oxide-semiconductor (CMOS) integrated resistive random-access memory (ReRAM) nvCIM macro-architecture, achieving higher input-output parallelism, reduced cell array area, improved precision, and reduced computational delay and energy consumption through the following techniques: - **Bit-line input-output multi-bit computation (BLIOMC) scheme**: Using single word-line and input-aware multi-bit bit-line clamping (IA-MBC) reduces the dynamic range of bit-line current, shortens input delay, and increases the number of parallel inputs. - **Staggered binary complement weight mapping and biasing (S2CWMB) scheme**: Reduces area overhead and current consumption. - **In-situ high resistance state current cancellation (HRS-C) scheme**: Improves signal margin and reduces energy consumption. - **High resistance state first quantization (HRS-FQ) process**: Balances energy consumption and inference accuracy. - **Dual-bit small offset current mode sense amplifier (DbSO-CSA)**: Shortens delay and reduces energy consumption of multi-bit readout operations. - **Global replica local mixed reference current generation (GRLM-RCG) scheme**: Reduces energy consumption of reference current generation. Through these techniques, the proposed nvCIM macro-architecture achieves delays of 9.2 to 18.3 nanoseconds and energy efficiency of 146.21 to 36.61 tera-operations per second per watt under binary and multi-bit input-weight-output configurations, respectively.