CHIMERA: A 0.92-TOPS, 2.2-TOPS/W Edge AI Accelerator With 2-MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference

Kartik Prabhu,Albert Gural,Zainab F. Khan,Robert M. Radway,Massimo Giordano,Kalhan Koul,Rohan Doshi,John W. Kustin,Timothy Liu,Gregorio B. Lopes,Victor Turbiner,Win-San Khwa,Yu-Der Chih,Meng-Fan Chang,Guenole Lallement,Boris Murmann,Subhasish Mitra,Priyanka Raina
DOI: https://doi.org/10.1109/jssc.2022.3140753
2022-04-01
Abstract:Implementing edge artificial intelligence (AI) inference and training is challenging with current memory technologies. As deep neural networks (DNNs) grow in size, this problem is only getting worse. This article presents CHIMERA, the first non-volatile DNN chip for both edge AI training and inference using foundry on-chip resistive RAM (RRAM) macros and no off-chip memory, fabricated in 40-nm CMOS. CHIMERA’s DNN accelerator is specifically optimized for RRAM and achieves 0.92-TOPS peak performance and 2.2-TOPS/W energy efficiency. We scale inference up to $6\times $ larger DNNs by connecting six CHIMERAs in an illusion system with just 4% overhead in measured execution time and 5% in energy, enabled by communication-sparse DNN mappings that exploit RRAM non-volatility through quick chip wake-up and shutdown ($ < 33 ~\mu \text{s}$ ). Our incremental edge AI training algorithm, called low-rank training, overcomes RRAM write energy, speed, and endurance challenges and achieves the same accuracy as traditional algorithms with up to $283\times $ fewer RRAM weight update steps and $340\times $ better energy-delay product. Combined with ENDUrance REsiliency using random Remapping (ENDURER), a hardware module that provides resilience to write endurance failures, we enable ten years of 20-samples/min incremental edge AI training.
What problem does this paper attempt to address?