Abstract:In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down to $\sim500$ nW [1]. However, most of them adopt noise-dependent training, i.e. training for a specific signal-to-noise ratio (SNR) and noise type [1], and therefore their accuracies degrade for different SNR levels and noise types that are not targeted in the training (Fig. 9.9.1, top left). To improve robustness, so-called noise-independent training can be considered, which is to use the training data that includes all the possible SNR levels and noise types [2]. But, this approach is challenging for an ultra-low-power device since it demands a large neural network to learn all the possible features. A neural network of a fixed size has its own memory capacity limit and reaches a plateau in accuracy if it has to learn more than its limit (Fig. 9.9.1, top right). On the other hand, it is known that biological acoustic systems employ a simpler process, called divisive energy normalization (DN), to maintain accuracy even in varying noise conditions [3]. In this work, therefore, by adopting such a DN, we prototype a normalized acoustic feature extractor chip (NAFE) in 65nm. The NAFE can take an acoustic signal from a microphone and produce spike-rate coded features. We pair NAFE with a spiking neural network (SNN) classifier chip [4], creating the end-to-end KWS system. The proposed system achieves 89-to-94% accuracy across -5 to 20dB SNRs and four different noise types on HeySnips [5], while the baseline without DN achieves a much lower accuracy of 71-87%. NAFE consumes up to 109nW and the KWS system 570nW.

A Depthwise Separable Convolution Neural Network for Small-footprint Keyword Spotting Using Approximate MAC Unit and Streaming Convolution Reuse

Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting

A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

Energy-Friendly Keyword Spotting System Using Add-Based Convolution.

Small-footprint Keyword Spotting with Graph Convolutional Network

NS-KWS: joint optimization of near-sensor processing architecture and low-precision GRU for always-on keyword spotting

Hello Edge: Keyword Spotting on Microcontrollers

Frequency & Channel Attention Network for Small Footprint Noisy Spoken Keyword Spotting

Low-complex and Highly-performed Binary Residual Neural Network for Small-footprint Keyword Spotting

9.1 μW keyword spotting processor based on optimized MFCC and small‐footprint TENet in 28‐nm CMOS

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

A Background-Noise and Process-Variation-Tolerant 109nW Acoustic Feature Extractor Based on Spike-Domain Divisive-Energy Normalization for an Always-On Keyword Spotting Device

Broadcasted Residual Learning for Efficient Keyword Spotting

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

An Energy-Efficient Binarized Neural Network Using Analog-Intensive Feature Extraction for Keyword and Speaker Verification Wakeup.

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting

Encoder-Decoder Neural Architecture Optimization for Keyword Spotting