Abstract:Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a near-sensor processing architecture of feature-configurable distributed network (NS-FDN) for always-on KWS applications. The proposed distributed network adapts to the flexible keywords demands in the actual scene by splitting the conventional single network into distributed sub-networks. We design a channel-independent training framework to improve the recognition accuracy of distributed networks. The speech features are evaluated and the redundancy is reduced in NS-FDN, which can also configure the speech features to further reduce the computing complexity and improve processing speed. For deeper optimization, we implement a 65nm-process prototype chip with near-sensor mixed-signal processing architecture avoiding energy-consuming analog-to-digital converter. By improving the system, algorithm, and hardware designs of the KWS, our co-optimized architecture eliminates the energy consumption bottleneck long-standing in conventional KWS systems and achieves state-of-the-art system performance. The experiment results show that NS-FDN achieves 31.6% energy consumption savings, 1.6 times memory savings, 57 times speedup, and 3.4% higher recognition accuracy compared with the state of the art.

More is Less: Domain-Specific Speech Recognition Microprocessor Using One-Dimensional Convolutional Recurrent Neural Network

An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor with On-Chip Self-Learning.

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS

A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS

A 11.6μ W Computing-on-Memory-Boundary Keyword Spotting Processor with Joint MFCC-CNN Ternary Quantization

A Fully Integrated 1.7mw Attention-Based Automatic Speech Recognition Processor

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Low-power Neuromorphic Speech Recognition Engine with Coarse-Grain Sparsity.

9.1 μW keyword spotting processor based on optimized MFCC and small‐footprint TENet in 28‐nm CMOS

Efficient Binary Weight Convolutional Network Accelerator for Speech Recognition

Hello Edge: Keyword Spotting on Microcontrollers

Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

DeltaKWS: A 65nm 36nJ/Decision Bio-inspired Temporal-Sparsity-Aware Digital Keyword Spotting IC with 0.6V Near-Threshold SRAM

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

A High Accuracy Multiple-Command Speech Recognition ASIC Based on Configurable One-Dimension Convolutional Neural Network.

A 68 mw 2.2 Tops/w low bit-width and multiplierless DCNN object detection processor for visually impaired people

NS-KWS: joint optimization of near-sensor processing architecture and low-precision GRU for always-on keyword spotting

A 510-nW Wake-Up Keyword-Spotting Chip Using Serial-FFT-Based MFCC and Binarized Depthwise Separable CNN in 28-nm CMOS

A 110nw Always-on Keyword Spotting Chip Using Spiking CNN in 40nm CMOS.

Residual Spiking Neural Network on a Programmable Neuromorphic Hardware for Speech Keyword Spotting