Abstract:The keyword spotting (KWS) system is one of the most important interfaces between humans and machines since it is usually the start of automatic speech recognition and natural language processing techniques. However, for KWS hardware, it is still a problem to make one specified chip both low power and high performed under multiple scenarios, such as in meeting rooms, on different traffic or in parks and so on, for different scenarios own wide range signal-noise-ratios (SNRs). The problem leads to the requirements of balanced design between KWS system accuracy and the hardware cost under various noise types and levels. To overcome the balanced design and tradeoff problems, a complete KWS processor including an Mel-Frequency Cepstrum Coefficients (MFCC) feature extractor and a quantized Convolutional Neural Network (QCNN) accelerator is proposed for wide SNR range and low-power KWS in this paper. Firstly, the approach to quantize CNNs into QCNNs with high accuracy is proposed with considerations of hardware-software tradeoff. With the tradeoff of KWS system accuracy and hardware cost, the 4bit/8bit dual-working-mode strategy is proposed to keep low hardware cost and high accuracy under different scenarios. To be specific, the training, tuning and validating of the CNNs and QCNNs are taken with the dataset of 10 keywords chosen from the Google Command Speech Dataset (GCSD). Secondly, a serial FFT based MFCC extractor is implemented with low power and small footprint. Finally, with a novel hybrid reuse strategy of input data and network weight, a reconfigurable and approximate computing based QCNN accelerator is designed. Implemented and verified under TSMC 22nm ULL technology, with the area of 1.42mm2, the QCNN accelerator can achieve 5.26μW/9.08μW power consumption in 4bit/8bit work mode with accuracy of 88% and 93% respectively, which is superior to the state-of-the-art processors.

A 608nW Near-Microphone Keyword-Spotting Chip Using Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN in 28nm CMOS

A 0.61-Μw Fully Integrated Keyword-Spotting ASIC with Real-Point Serial FFT-Based MFCC and Temporal Depthwise Separable CNN

14.1 A 510nw 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS

AAD-KWS: a Sub- $\mu\mathrm{w}$ Keyword Spotting Chip with a Zero-Cost, Acoustic Activity Detector from a 170nw MFCC Feature Extractor in 28nm CMOS

A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

A $2.81\mu \mathrm{w}$, Energy Efficient MFCC Feature Extractor for Keyword-Spotting in 65nm CMOS

AAD-KWS: A Sub-μ W Keyword Spotting Chip with an Acoustic Activity Detector Embedded in MFCC and a Tunable Detection Window in 28-Nm CMOS

A $2.81\mu \mathrm{W}$, Energy Efficient MFCC Feature Extractor for Keyword-Spotting in 65nm CMOS.

A 110nw Always-on Keyword Spotting Chip Using Spiking CNN in 40nm CMOS.

QCNN Inspired Reconfigurable Keyword Spotting Processor with Hybrid Data-Weight Reuse Methods

A Background-Noise and Process-Variation-Tolerant 109nW Acoustic Feature Extractor Based on Spike-Domain Divisive-Energy Normalization for an Always-On Keyword Spotting Device

A 22nm, 10.8 <italic>μ</italic> W/15.1 <italic>μ</italic> W Dual Computing Modes High Power-Performance-Area Efficiency Domained Background Noise Aware Keyword- Spotting Processor

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

9.1 μW keyword spotting processor based on optimized MFCC and small‐footprint TENet in 28‐nm CMOS

An Ultra-Low Power Always-On Keyword Spotting Accelerator Using Quantized Convolutional Neural Network and Voltage-Domain Analog Switching Network-Based Approximate Computing

A 11.6μ W Computing-on-Memory-Boundary Keyword Spotting Processor with Joint MFCC-CNN Ternary Quantization

An Ultra-low Power Keyword-Spotting Accelerator Using Circuit-Architecture-System Co-design and Self-adaptive Approximate Computing Based BWN

A Low-Power Keyword Spotting System with High-Order Passive Switched-Capacitor Bandpass Filters for Analog-MFCC Feature Extraction

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

A Low-power and High-accuracy Accelerator with Voice Classification for Keyword Spotting

EERA-KWS: A 163 TOPS/W Always-on Keyword Spotting Accelerator in 28nm CMOS Using Binary Weight Network and Precision Self-Adaptive Approximate Computing.