Abstract:In real-time applications, the aim of speech enhancement (SE) is to achieve optimal performance while ensuring computational efficiency and near-instant outputs. Many deep neural models have achieved optimal performance in terms of speech quality and intelligibility. However, formulating efficient and compact deep neural models for real-time processing on resource-limited devices remains a challenge. This study presents a compact neural model designed in a complex frequency domain for speech enhancement, optimized for resource-limited devices. The proposed model combines convolutional encoder–decoder and recurrent architectures to effectively learn complex mappings from noisy speech for real-time speech enhancement, enabling low-latency causal processing. Recurrent architectures such as Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Simple Recurrent Unit (SRU), are incorporated as bottlenecks to capture temporal dependencies and improve the performance of SE. By representing the speech in the complex frequency domain, the proposed model processes both magnitude and phase information. Further, this study extends the proposed models and incorporates attention-gate-based skip connections, enabling the models to focus on relevant information and dynamically weigh the important features. The results show that the proposed models outperform the recent benchmark models and obtain better speech quality and intelligibility. The proposed models show less computational load and deliver better results. This study uses the WSJ0 database where clean sentences from WSJ0 are mixed with different background noises to create noisy mixtures. The results show that STOI and PESQ are improved by 21.1% and 1.25 (41.5%) on the WSJ0 database whereas, on the VoiceBank+DEMAND database, STOI and PESQ are improved by 4.1% and 1.24 (38.6%) respectively. The extension of the models shows further improvement in STOI and PESQ in seen and unseen noisy conditions.

Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet

Single Channel Speech Enhancement Using U-Net Spiking Neural Networks

DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement

Low-power Neuromorphic Speech Recognition Engine with Coarse-Grain Sparsity.

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Towards Energy-Preserving Natural Language Understanding with Spiking Neural Networks

Residual Spiking Neural Network on a Programmable Neuromorphic Hardware for Speech Keyword Spotting

Spiking Structured State Space Model for Monaural Speech Enhancement

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement.

A speech enhancement model based on noise component decomposition: Inspired by human cognitive behavior

Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression

Ternary Spike-based Neuromorphic Signal Processing System

Compact Deep Neural Networks for Real-Time Speech Enhancement on Resource-Limited Devices

Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection

Ultra-Low Latency Speech Enhancement - A Comprehensive Study

Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks