Abstract:Recently, deep neural networks (DNNs) have become the mainstream strategy for speech enhancement task because it can achieve the higher speech quality and intelligibility than the traditional methods. However, these DNN-based methods always need a large number of parallel corpus consisting of clean speech and noise to produce noisy data for the training of the DNN in order to improve the generalization of the network. As a result, this implies that many noisy speech signals that are collected in real environment cannot be used to train the DNN because of the lack of corresponding clean speech and noise. Additionally, as we know, noise varies with the time and scenario, so we cannot obtain parallel speech and noise due to infinite noise data and some limited speech data. Thus, the network training with unparallel speech and noise data is essential for the generalization of the network. To address this problem, we propose a novel parallel-data-free speech enhancement method, in which the cycle-consistent generative adversarial network (CycleGAN) and multi-objective learning are employed. Our method is also able to make best use of the benefits of multi-objective learning. On the training stage, we utilize two different encoders to encode the features of clean speech and noisy speech, respectively. Then, two forward generators are immediately used to predict the ideal time-frequency (T-F) mask and log-power spectrum (LPS) of clean speech. Two inverse generators are applied to map the magnitude spectrum (MS) and LPS of noisy speech, respectively. In addition, four discriminators are used to distinguish the real speech features from the generated features. Two encoders, four generators and four discriminators are simultaneously trained by using adversarial, identity-mapping, latent similarity and cycle-consistent loss. On the test stage, we directly utilize the forward generators and encoders to acquire the enhanced speech. The experimental results indicate that the proposed approach is able to achieve the better speech enhancement performance than the reference methods. Moreover, the proposed method is also effective to improve speech quality and intelligibility when the networks are trained under the parallel data.

A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network

A Weekly Supervised Speech Enhancement Strategy Using Cycle-GAN

CycleGAN-based Speech Enhancement for the Unpaired Training Data

Joint Magnitude Estimation and Phase Recovery Using Cycle-In-Cycle GAN for Non-Parallel Speech Enhancement

CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism

A Two-stage Complex Network Using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Improved Relativistic Cycle-Consistent GAN with Dilated Residual Network and Multi-Attention for Speech Enhancement

Speech Enhancement Via Generative Adversarial Lstm Networks

Multi-scale Generative Adversarial Networks for Speech Enhancement

Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement

Speech Enhancement Based on Cyclegan with Noise-informed Training

CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement

Multi-target Voice Conversion Without Parallel Data by Adversarially Learning Disentangled Audio Representations

Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN

Multi-Stage Progressive Speech Enhancement Network

Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Low-latency Speech Enhancement via Speech Token Generation

CycleGAN-VC-GP: Improved CycleGAN-based Non-parallel Voice Conversion

Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement

Speech Enhancement Generative Adversarial Network Architecture with Gated Linear Units and Dual-Path Transformers

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.