Abstract:Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech enhancement system based on the brain-inspired spiking neural network (SNN) called Spiking-FullSubNet. Spiking-FullSubNet follows a full-band and sub-band fusioned approach to effectively capture both global and local spectral information. To enhance the efficiency of computationally expensive sub-band modeling, we introduce a frequency partitioning method inspired by the sensitivity profile of the human peripheral auditory system. Furthermore, we introduce a novel spiking neuron model that can dynamically control the input information integration and forgetting, enhancing the multi-scale temporal processing capability of SNN, which is critical for speech denoising. Experiments conducted on the recent Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge dataset show that the Spiking-FullSubNet surpasses state-of-the-art methods by large margins in terms of both speech quality and energy efficiency metrics. Notably, our system won the championship of the Intel N-DNS Challenge (Algorithmic Track), opening up a myriad of opportunities for ultra-low-power speech enhancement at the edge. Our source code and model checkpoints are publicly available at <a class="link-external link-https" href="https://github.com/haoxiangsnr/spiking-fullsubnet" rel="external noopener nofollow">this https URL</a>.

Neuron Sparseness Versus Connection Sparseness in Deep Neural Network for Large Vocabulary Speech Recognition

Efficient Structure Slimming for Spiking Neural Networks

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

Building DNN acoustic models for large vocabulary speech recognition

Weight-importance sparse training in keyword spotting

Compressing Neural Language Models by Sparse Word Representations

Deep and Sparse Learning in Speech and Language Processing: An Overview

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR Via Supernet

Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling

Low-power Neuromorphic Speech Recognition Engine with Coarse-Grain Sparsity.

Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet

SRS-DNN: a Deep Neural Network with Strengthening Response Sparsity

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition

Improving deep neural networks for LVCSR using dropout and shrinking structure

Sparseness Analysis in the Pretraining of Deep Neural Networks

Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models

Structure Growth for Small-Footprint Speech Recognition

Maxout Neurons for Deep Convolutional and LSTM Neural Networks in Speech Recognition