AAD-KWS: A Sub-μ W Keyword Spotting Chip with an Acoustic Activity Detector Embedded in MFCC and a Tunable Detection Window in 28-Nm CMOS

Weiwei Shan,Junyi Qian,Lixuan Zhu,Jun Yang,Cheng Huang,Hao Cai
DOI: https://doi.org/10.1109/jssc.2022.3197838
IF: 5.4
2023-01-01
IEEE Journal of Solid-State Circuits
Abstract:As a widely used speech-triggered interface, deep-learning-based keyword spotting (KWS) chips require both ultra-low power and high detection accuracy. We propose a sub-microwatt KWS chip with an acoustic activity detection (AAD) to achieve the above two requirements, including the following techniques: first, an optimized feature extractor circuit using nonoverlapping-framed serial Mel frequency cepstral coefficient (MFCC) to save half of the computations and data storage; second, a zero-cost AAD by using MFCC’s 1st-order output to clock gate neural network (NN) and postprocessing (PP) unit, with 0 miss rate; third, a tunable detection window to adapt to different keyword lengths for better accuracy; and finally, a true form computation method to decrease data transitions and optimized PP. Implemented in a 28-nm CMOS process, this AAD-KWS chip has a 0.4-V supply, an 8-kHz frequency for MFCC, and a 200-kHz frequency for other parts. It consumes $0.36~\mu \text{W}$ in quiet scenarios when AAD is enabled and $0.8~\mu \text{W}$ in normal scenarios, where the MFCC circuit consumes only 170 nW. Its accuracy reaches 97.8% for two keywords in the Google Speech Command Dataset (GSCD).
What problem does this paper attempt to address?