Abstract:In real-time applications, the aim of speech enhancement (SE) is to achieve optimal performance while ensuring computational efficiency and near-instant outputs. Many deep neural models have achieved optimal performance in terms of speech quality and intelligibility. However, formulating efficient and compact deep neural models for real-time processing on resource-limited devices remains a challenge. This study presents a compact neural model designed in a complex frequency domain for speech enhancement, optimized for resource-limited devices. The proposed model combines convolutional encoder–decoder and recurrent architectures to effectively learn complex mappings from noisy speech for real-time speech enhancement, enabling low-latency causal processing. Recurrent architectures such as Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Simple Recurrent Unit (SRU), are incorporated as bottlenecks to capture temporal dependencies and improve the performance of SE. By representing the speech in the complex frequency domain, the proposed model processes both magnitude and phase information. Further, this study extends the proposed models and incorporates attention-gate-based skip connections, enabling the models to focus on relevant information and dynamically weigh the important features. The results show that the proposed models outperform the recent benchmark models and obtain better speech quality and intelligibility. The proposed models show less computational load and deliver better results. This study uses the WSJ0 database where clean sentences from WSJ0 are mixed with different background noises to create noisy mixtures. The results show that STOI and PESQ are improved by 21.1% and 1.25 (41.5%) on the WSJ0 database whereas, on the VoiceBank+DEMAND database, STOI and PESQ are improved by 4.1% and 1.24 (38.6%) respectively. The extension of the models shows further improvement in STOI and PESQ in seen and unseen noisy conditions.

SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech

Scalable Speech Enhancement with Dynamic Channel Pruning

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement.

Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules

Speech enhancement based on estimating expected values of speech cepstra

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Personalized Speech Enhancement Without a Separate Speaker Embedding Model

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

TEA-PSE: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2022 DNS Challenge

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

PERSONALIZED SPEECH ENHANCEMENT: NEW MODELS AND COMPREHENSIVE EVALUATION

A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement with Compact Neural Network Architectures

An Investigation of Incorporating Mamba for Speech Enhancement

LPCSE: Neural Speech Enhancement through Linear Predictive Coding

Espnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development

Multi-Stage Progressive Speech Enhancement Network

Compact Deep Neural Networks for Real-Time Speech Enhancement on Resource-Limited Devices

A Speech Enhancement Algorithm Using Computational Auditory Scene Analysis with Spectral Subtraction

Speech Enhancement Based On Analysis Synthesis Framework With Improved Pitch Estimation And Spectral Envelope Enhancement