Abstract:Generative Adversarial Networks (GANs) have been used in the field of speech enhancement due to their huge potentials in reducing the noise mixed in the signals. Most of existing GAN-based speech enhancement approaches either operate on time domain or exploit the magnitude spectra in time-frequency domain, but lack consideration of direct optimization of the phase. In this paper, we propose a GAN architecture for speech enhancement based on gated linear units (GLUs) and Dual-Path Transformers (DPTs), which simultaneously deals with the amplitude and phase information on the time-frequency domain. The generator of the proposed GAN architecture is designed following an autoencoder structure fed by the real and imaginary parts of the time-frequency frames. The encoder of the generator is constructed by multiple cascaded convolutional GLUs (ConvGLUs), while the decoder consists of two groups of cascaded deconvolutional GLUs (DeconvGLUs), one for the real part of the spectrogram and the other for the imaginary part. The GLUs are adopted since they are potential in avoiding the gradient vanishing issue dwelling in deep architectures by providing a linear path for the gradients while retaining non-linear capabilities. Aiming at capturing the long-range dependent features in speech, we place DPTs between the encoder and the decoder of the generator, which contains multi-head attention modules and Bi-directional Gated Recurrent Units (BiGRUs). Moreover, the DPT structure is also merged with multiple one-dimensional convolutional layers in the discriminator of the GAN. Such a design not only improves the speech enhancement performance of GAN by focusing on multiple features of speech, but also reducing the volume of model parameters of GAN. Experimental results suggest that the proposed GAN architecture outperforms the existing benchmark GANs in terms of both objective speech intelligibility and quality with less computational complexity.

Improved Wasserstein Conditional Generative Adversarial Network Speech Enhancement.

Speech Enhancement Based on A New Architecture of Wasserstein Generative Adversarial Networks.

A Loss with Mixed Penalty for Speech Enhancement Generative Adversarial Network

Low-latency Speech Enhancement via Speech Token Generation

A Conditional Generative Model for Speech Enhancement

Speech Enhancement Via Generative Adversarial Lstm Networks

Improved Relativistic Cycle-Consistent GAN with Dilated Residual Network and Multi-Attention for Speech Enhancement

GSC Based Speech Enhancement with Generative Adversarial Network

SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

Sdgan: Improve Speech Enhancement Quality by Information Filter

Improvement of Packet Loss Concealment for EVS Codec Based on Deep Learning

Tdcgan: Temporal Dilated Convolutional Generative Adversarial Network for End-to-end Speech Enhancement

Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition

GANs for Children: A Generative Data Augmentation Strategy for Children Speech Recognition

Speech Enhancement Generative Adversarial Network Architecture with Gated Linear Units and Dual-Path Transformers

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Restoring Lost Speech Components with Generative Adversarial Networks for Speech Communications in Adverse Conditions

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

SEGAN: Speech Enhancement Generative Adversarial Network

Frame-level speech enhancement based on Wasserstein GAN