Abstract:In real-time applications, the aim of speech enhancement (SE) is to achieve optimal performance while ensuring computational efficiency and near-instant outputs. Many deep neural models have achieved optimal performance in terms of speech quality and intelligibility. However, formulating efficient and compact deep neural models for real-time processing on resource-limited devices remains a challenge. This study presents a compact neural model designed in a complex frequency domain for speech enhancement, optimized for resource-limited devices. The proposed model combines convolutional encoder–decoder and recurrent architectures to effectively learn complex mappings from noisy speech for real-time speech enhancement, enabling low-latency causal processing. Recurrent architectures such as Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), and Simple Recurrent Unit (SRU), are incorporated as bottlenecks to capture temporal dependencies and improve the performance of SE. By representing the speech in the complex frequency domain, the proposed model processes both magnitude and phase information. Further, this study extends the proposed models and incorporates attention-gate-based skip connections, enabling the models to focus on relevant information and dynamically weigh the important features. The results show that the proposed models outperform the recent benchmark models and obtain better speech quality and intelligibility. The proposed models show less computational load and deliver better results. This study uses the WSJ0 database where clean sentences from WSJ0 are mixed with different background noises to create noisy mixtures. The results show that STOI and PESQ are improved by 21.1% and 1.25 (41.5%) on the WSJ0 database whereas, on the VoiceBank+DEMAND database, STOI and PESQ are improved by 4.1% and 1.24 (38.6%) respectively. The extension of the models shows further improvement in STOI and PESQ in seen and unseen noisy conditions.

EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement.

Study on convolutional recurrent neural networks for speech enhancement in fiber-optic microphones

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement

Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks.

DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement

GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources

TFCN: Temporal-Frequential Convolutional Network for Single-Channel Speech Enhancement

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

MFT-CRN:Multi-scale Fourier Transform for Monaural Speech Enhancement

An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement

Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition

Deep Residual-Dense Lattice Network for Speech Enhancement

Compact Deep Neural Networks for Real-Time Speech Enhancement on Resource-Limited Devices

FLGCNN: A Novel Fully Convolutional Neural Network for End-to-end Monaural Speech Enhancement with Utterance-Based Objective Functions

Speech Enhancement Algorithm Based on Microphone Array and Lightweight CRN for Hearing Aid

A Real-Time Speech Enhancement Algorithm Based on Convolutional Recurrent Network and Wiener Filter

Single-Channel Speech Enhancement Algorithm Based on ME-MGCRN in Low Signal-to-Noise Scenario