SEGAN: Speech Enhancement Generative Adversarial Network

Santiago Pascual,A. Bonafonte,J. Serrà

DOI: https://doi.org/10.21437/Interspeech.2017-1428

2017-03-28

Abstract:Current speech enhancement techniques operate on the spectral domain and/or exploit some higher-level feature. The majority of them tackle a limited number of noise conditions and rely on first-order statistics. To circumvent these issues, deep networks are being increasingly used, thanks to their ability to learn complex functions from large example sets. In this work, we propose the use of generative adversarial networks for speech enhancement. In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them. We evaluate the proposed model using an independent, unseen test set with two speakers and 20 alternative noise conditions. The enhanced samples confirm the viability of the proposed model, and both objective and subjective evaluations confirm the effectiveness of it. With that, we open the exploration of generative architectures for speech enhancement, which may progressively incorporate further speech-centric design choices to improve their performance.

Mathematics,Computer Science

What problem does this paper attempt to address?

The paper aims to address the problem of speech enhancement, specifically by improving the clarity and quality of speech through the removal of background noise. The authors propose a novel method, namely using Generative Adversarial Networks (GAN) for speech enhancement (SEGAN). Unlike traditional methods based on spectral domain processing or high-order feature extraction, SEGAN works directly at the waveform level and employs an end-to-end training approach. This method can handle various noise conditions and speech data from multiple speakers, allowing model parameters to be shared across different speakers and noise types. Experimental results show that SEGAN not only outperforms traditional Wiener filtering methods in objective evaluation metrics but also performs better in subjective auditory tests, demonstrating its potential as an effective alternative to existing technologies.

SEGAN: Speech Enhancement Generative Adversarial Network

Towards Generalized Speech Enhancement with Generative Adversarial Networks

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

VSEGAN: Visual Speech Enhancement Generative Adversarial Network

iSEGAN: Improved Speech Enhancement Generative Adversarial Networks

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Self-Attention Generative Adversarial Network for Speech Enhancement

SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement

Noise Prior Knowledge Learning for Speech Enhancement Via Gated Convolutional Generative Adversarial Network

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

AeGAN: Time-Frequency Speech Denoising via Generative Adversarial Networks

Time-domain Speech Enhancement with Generative Adversarial Learning

Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement

Robust Real-time Audio-Visual Speech Enhancement based on DNN and GAN

High Fidelity Speech Synthesis with Adversarial Networks

Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

Targeted Speech Adversarial Example Generation With Generative Adversarial Network