Abstract:In many acoustic conditions, a single-channel recorded speech signal may be severely affected by reverberation and noise, leading to a reduced speech quality and intelligibility. This paper focuses on proposing a novel two stage model scheme by decomposing room impulse responses (RIRs) into two convolution parts for single channel speech dereverberation and denoising. Similar as previous methods, the proposed two-stage model uses non-negative approximations of the convolutive transfer function (N-CTF) to simultaneously estimate the magnitude spectrograms of the speech and the RIR. It focuses on iteratively updating model parameters to estimate a less reverberant speech signal and a short RIR at first stage, then the clean speech signal and the other short RIR are estimated by iteratively renewing at the second stage. There are always denosing processing steps existing in both stages to denoise more thoroughly. A straightforward method based on the scheme is built to enhance the speech from the noisy reverberant signal, then two fusion methods inspired by ensemble learning are proposed for speech enhancement. The advantages of our proposed methods are more capable to enhance the speech and more time-saving through decomposing the long RIRs into two shorter ones. Additionally, the optimal estimator is derived based on temporal stacking to utilize speech temporal dynamics. Experiments are performed on two simulated RIRs and a real RIR to compare the performances of the proposed methods with a state-of-the-art method and the results show that the proposed methods have achieved either better or comparable performances in most measures but phone error rate.

Reverberant Signal Separation Using Optimized Complex Sparse Nonnegative Tensor Deconvolution on Spectral Covariance Matrix

Underdetermined Reverberant Acoustic Source Separation Using Weighted Full-Rank Nonnegative Tensor Models.

Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors

Cepstral Smoothing of Spectral Masks for Acoustic Vector-Sensor Based Convolutive Speech Separation

Reverberant Speech Separation with Probabilistic Time-Frequency Masking for B-format Recordings.

An Adaptive Single Channel EMD-TNMF Blind Source Separation Algorithm for Both Instantaneous and Convolutive Mixed Signal

Single channel informed signal separation using artificial‐stereophonic mixtures and exemplar‐guided matrix factor deconvolution

Underdetermined Convolutive Blind Separation of Sources Integrating Tensor Factorization and Expectation Maximization.

Adaptive Beamforming Based on Interference-Plus-Noise Covariance Matrix Reconstruction for Speech Separation

Supervised Single-Channel Speech Dereverberation and Denoising Using a Two-Stage Model Based Sparse Representation.

Underdetermined Reverberant Audio-Source Separation Through Improved Expectation–Maximization Algorithm

Unsupervised Learning For Monaural Source Separation Using Maximization-Minimization Algorithm With Time-Frequency Deconvolution

Multifactor Sparse Feature Extraction Using Convolutive Nonnegative Tucker Decomposition

Supervised Single-Channel Speech Dereverberation And Denoising Using A Two-Stage Processing

Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura–Saito Nonnegative Matrix Two-Dimensional Factorizations

Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement

A Multi-Source Separation Approach Based on DOA Cue and DNN

Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

Online Noisy Single-Channel Source Separation Using Adaptive Spectrum Amplitude Estimator and Masking.

Single-channel blind separation using L₁-sparse complex non-negative matrix factorization for acoustic signals.