Abstract:Recently, deep neural networks (DNNs) have become the mainstream strategy for speech enhancement task because it can achieve the higher speech quality and intelligibility than the traditional methods. However, these DNN-based methods always need a large number of parallel corpus consisting of clean speech and noise to produce noisy data for the training of the DNN in order to improve the generalization of the network. As a result, this implies that many noisy speech signals that are collected in real environment cannot be used to train the DNN because of the lack of corresponding clean speech and noise. Additionally, as we know, noise varies with the time and scenario, so we cannot obtain parallel speech and noise due to infinite noise data and some limited speech data. Thus, the network training with unparallel speech and noise data is essential for the generalization of the network. To address this problem, we propose a novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed. Our method is also able to make best use of the benefits of multi-objective learning. On the training stage, we utilize two different encoders to encode the features of clean speech and noisy speech, respectively. Then, two forward generators are immediately used to predict the ideal time-frequency (T-F) mask and log-power spectrum (LPS) of clean speech. Two inverse generators are applied to map the magnitude spectrum (MS) and LPS of noisy speech, respectively. In addition, four discriminators are used to distinguish the real speech features from the generated features. Two encoders, four generators and four discriminators are simultaneously trained by using adversarial, identity-mapping, latent similarity and cycle-consistent loss. On the test stage, we directly utilize the forward generators and encoders to acquire the enhanced speech. The experimental results indicate that the proposed approach is able to achieve the better speech enhancement performance than the reference methods. Moreover, the proposed method is also effective to improve speech quality and intelligibility when the networks are trained under the parallel data.

A Weekly Supervised Speech Enhancement Strategy Using Cycle-GAN

A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network

CycleGAN-based Speech Enhancement for the Unpaired Training Data

Joint Magnitude Estimation and Phase Recovery Using Cycle-In-Cycle GAN for Non-Parallel Speech Enhancement

Speech Enhancement Based on Cyclegan with Noise-informed Training

A Two-stage Complex Network Using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Speech Enhancement Via Generative Adversarial Lstm Networks

CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism

DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network for Speech Enhancement

Multi-scale Generative Adversarial Networks for Speech Enhancement

Feature-Matching Speech Denoising GANs via Progressive Training.

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Improved Relativistic Cycle-Consistent GAN with Dilated Residual Network and Multi-Attention for Speech Enhancement

GSC Based Speech Enhancement with Generative Adversarial Network

Study of GANs for Noisy Speech Simulation from Clean Speech

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

A Speech Enhancement Method Based on Dual-Path Phase-Aware GAN Networks

Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN

Speech Enhancement Generative Adversarial Network Architecture with Gated Linear Units and Dual-Path Transformers

VSEGAN: Visual Speech Enhancement Generative Adversarial Network

Low-latency Speech Enhancement via Speech Token Generation