Abstract:Recent advancements in speech enhancement have witnessed the emergence of generator-based methodologies. However, several of these approaches exhibit complexity in handling input variations, either excelling at low signal-to-noise ratios (SNRs) by utilizing intricate representations of noisy and clean speech or demonstrating superior performance only at higher SNRs. In this work, we investigated speech enhancement using a Dilated Attention Fast Generative Adversarial Network (DAF-GAN). The proposed DAF-GAN framework achieves stability in performance across different SNR conditions by efficiently processing large-scale signal lengths. The DFS-GAN features a dilated discriminator model operating via patches. The generator architecture incorporates multi-decoding and attention gates facilitated through skip-connections, strategically integrated within the Fast-U-Net model to optimize processing speed. An ideal ratio mask was used in the test phase to further refine the enhanced signal by emphasizing target speech while suppressing residual noise or artifacts. The DAF-GAN performance was assessed using objective metrics such as PESQ on a number of noisy speech databases. Results revealed that the DAF-GAN performed modestly in comparison with the state-of-the-art models. For example, analyses of the VoiceBank-DEMAND dataset yielded a PESQ score of 2.50 for the DAF-GAN.

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Time-domain Speech Enhancement with Generative Adversarial Learning

A Speech Enhancement Method Based on Dual-Path Phase-Aware GAN Networks

CMGAN: Conformer-based Metric GAN for Speech Enhancement

mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement

CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement

A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement

Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

Low-latency Speech Enhancement via Speech Token Generation

Joint Magnitude Estimation and Phase Recovery Using Cycle-In-Cycle GAN for Non-Parallel Speech Enhancement

Enhancing Unsupervised Speech Recognition with Diffusion GANs

Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

Convolutional Recurrent MetriCGAN with Spectral Dimension Compression for Full-Band Speech Enhancement

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS

Performance analysis of a dilated attention fast GAN for speech enhancement