AAD-KWS: a Sub- $\mu\mathrm{w}$ Keyword Spotting Chip with a Zero-Cost, Acoustic Activity Detector from a 170nw MFCC Feature Extractor in 28nm CMOS

Lixuan Zhu,Weiwei Shan,Jiaming Xu,Yicheng Lu
DOI: https://doi.org/10.1109/essderc53440.2021.9631816
2021-01-01
Abstract:As a widely used speech-triggered interface, deep-learning based keyword spotting (KWS) chips require both ultra-low power and high detection accuracy. We propose an always-on keyword spotting chip with an acoustic activity detection (AAD) to achieve the above two requirements. Extracted from feature extractor, this AAD has zero overhead and zero miss rate. It is used to clock gate the neural network and post processing unit to achieve ultra-low power at silent scenarios. We also propose a tunable detection window to fit keywords with different widths to get better accuracy. Besides, a non-overlapping-frame Mel frequency cepstrum coefficient (MFCC) is used in the KWS system to reduce memory footprint and processing cycles. Implemented in a 28nm CMOS technology, its power consumption is only $0.36\mu\mathrm{W}$ for AAD at quiet scenarios and $0.8\mu\mathrm{W}$ for KWS, operating at 0.4V supply voltage with 8kHz for MFCC and 200kHz for other parts. And MFCC circuit has only 170nW power consumption. The accuracy can reach 97.8% for two keywords in the Google speech command data set (GSCD).
What problem does this paper attempt to address?