A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS
Weiwei Shan,Minhao Yang,Tao Wang,Yicheng Lu,Hao Cai,Lixuan Zhu,Jiaming Xu,Chengjun Wu,Longxing Shi,Jun Yang
DOI: https://doi.org/10.1109/jssc.2020.3029097
2021-01-01
Abstract:We propose a sub- $mu text{W}$ always-ON keyword spotting ( $mu $ KWS) chip for audio wake-up systems. It is mainly composed of a neural network (NN) and a feature extraction (FE) circuit. For significantly reducing the memory footprint and computational load, four techniques are used to achieve ultra-low-power consumption: 1) a serial-FFT-based Mel-frequency cepstrum coefficient circuit is designed for FE, instead of the common parallel FFT. 2) A small-sized binarized depthwise separable convolutional NN (DSCNN) is designed as the classifier. 3) A framewise incremental computation technique is devised in contrast to the conventional whole-word processing. 4) Reduced computation allows a low system clock frequency, which enables near-threshold voltage operation, and low leakage memory blocks are designed to minimize the leakage power. Implemented in 28-nm CMOS technology, this $mu $ KWS consumes $0.51~mu text{W}$ at a 40-kHz frequency and a 0.41-V supply, with an area of 0.23 mm<sup>2</sup>. Using the Google speech command data set, 97.3% accuracy is reached for a one-word KWS task and 94.6% for a two-word task.
engineering, electrical & electronic