Abstract:Keyword spotting (KWS) is beneficial for voice-based user interactions with low-power devices at the edge. The edge devices are usually always-on, so edge computing brings bandwidth savings and privacy protection. The devices typically have limited memory spaces, computational performances, power and costs, for example, Cortex-M based microcontrollers. The challenge is to meet the high computation and low-latency requirements of deep learning on these devices. This paper firstly shows our small-footprint KWS system running on STM32F7 microcontroller with Cortex-M7 core @216MHz and 512KB static RAM. Our selected convolutional neural network (CNN) architecture has simplified number of operations for KWS to meet the constraint of edge devices. Our baseline system generates classification results for each 37ms including real-time audio feature extraction part. This paper further evaluates the actual performance for different pruning and quantization methods on microcontroller, including different granularity of sparsity, skipping zero weights, weight-prioritized loop order, and SIMD instruction. The result shows that for microcontrollers, there are considerable challenges for accelerate unstructured pruned models, and the structured pruning is more friendly than unstructured pruning. The result also verified that the performance improvement for quantization and SIMD instruction.

Model Shrinking for Embedded Keyword Spotting

Model compression applied to small-footprint keyword spotting

An Efficient Temporal Model for Small-Footprint Keyword Spotting

Sparse Binarization for Fast Keyword Spotting

Weight-importance sparse training in keyword spotting

Predicting detection filters for small footprint open-vocabulary keyword spotting

Focal Loss And Double-Edge-Triggered Detector For Robust Small-Footprint Keyword Spotting

Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

Keyword Spotting System and Evaluation of Pruning and Quantization Methods on Low-power Edge Microcontrollers

Hello Edge: Keyword Spotting on Microcontrollers

An Empirical Study of Cross-Lingual Transfer Learning Techniques for Small-Footprint Keyword Spotting.

Keyword-specific normalization based keyword spotting for spontaneous speech

A Depthwise Separable Convolution Neural Network for Small-footprint Keyword Spotting Using Approximate MAC Unit and Streaming Convolution Reuse

A Novel Coding Scheme for Keyword Spotting

Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting

Low-Power Audio Keyword Spotting using Tsetlin Machines

Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks

An Ultra-low Power Keyword-Spotting Accelerator Using Circuit-Architecture-System Co-design and Self-adaptive Approximate Computing Based BWN

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting

Compact Feedforward Sequential Memory Networks For Small-Footprint Keyword Spotting

Keyword Spotting Based on Restricting Model and Acoustic Confidence Measure