A 0.61-Μw Fully Integrated Keyword-Spotting ASIC with Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN

Cai Li,Haochang Zhi,Kaiyue Yang,Junyi Qian,Zhihao Yan,Lixuan Zhu,Chao Chen,Xi Wang,Weiwei Shan
DOI: https://doi.org/10.1109/jssc.2023.3339528
IF: 5.4
2024-01-01
IEEE Journal of Solid-State Circuits
Abstract:A fully integrated near-microphone keyword spotting (KWS) chip is proposed to directly interact with a passive microphone and achieve submicrowatt power for the Internet of Things (IoT) devices. First, an on-chip analog frontend (AFE) is designed to avoid the inclusion of power-intensive off-chip active microphones. Second, a real-point serial fast Fourier transform (FFT)-based Mel-frequency cepstral coefficient (MFCC) feature extractor, cooperating with a genetic algorithm (GA) optimized bit-width quantization, is specifically customized to reduce the MFCC power by 67.4%. Finally, a binarized temporal depthwise separable CNN (TDSCN) is proposed, featuring hardware optimization through a parallel adder tree (PAT)-based PE with near-memory computing. This results in a 78.9% reduction in computation as compared to the traditional depthwise separable convolutional neural networks (CNNs). Fabricated in a 28-nm CMOS process, the proposed KWS chip consumes the lowest power of 0.61 $\mu$ W at 0.36-V neural network (NN), 0.9-V AFE, and 8-KHz frequency, while keeping 95.8% accuracy for two-KWS on Google speech command dataset (GSCD).
What problem does this paper attempt to address?