Abstract:This paper addresses the problems of blind multichannel identification and equalization for joint speech dereverberation and noise reduction. The time-domain cross-relation method is hardly applicable for blind room impulse response identification due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse response is approximately represented by the convolutive transfer function (CTF) with much less coefficients. For the oversampled STFT, CTFs suffer from the common zeros caused by the nonflat frequency response of the STFT window. To overcome this, we propose to identify CTFs using the STFT framework with oversampled signals and critically sampled CTFs, which is a good tradeoff between the frequency aliasing of the signals and the common zeros problem of CTFs. The identified complex-valued CTFs are not accurate enough for multichannel equalization due to the frequency aliasing of the CTFs. Hence, we only use the CTF magnitudes, which leads to a nonnegative multichannel equalization method based on a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude. Compared with the complex-valued convolution model, this nonnegative convolution model is shown to be more robust against the CTF perturbations. To recover the STFT magnitude of the source signal and to reduce the additive noise, the $\ell _2$-norm fitting error between the STFT magnitude of the microphone signals and the nonnegative convolution is constrained to be less than a noise power related tolerance. Meanwhile, the $\ell _1$ -norm of the STFT magnitude of the source signal is minimized to impose the sparsity.

Expectation-maximisation for Speech Source Separation Using Convolutive Transfer Function

Multichannel blind speech source separation with a disjoint constraint source model

Blind MultiChannel Identification and Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction Based on Convolutive Transfer Function.

Adaptive Beamforming Based on Interference-Plus-Noise Covariance Matrix Reconstruction for Speech Separation

Deconvolution-based Acoustic Source Localization and Separation Algorithms

Quasi-Blind Source Separation Algorithm for Convolutive Mixture of Speech

Acoustic Source Localization and Deconvolution-Based Separation

An online blind source separation for convolutive acoustic signals in frequency-domain

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Experiments on Blind Speech Separations

A Multichannel Learning-Based Approach for Sound Source Separation in Reverberant Environments

Time-Domain Mapping with Convolution Networks for End-to-End Monaural Speech Separation

Listen and Look: Audio–Visual Matching Assisted Speech Source Separation

End-to-end Networks for Supervised Single-channel Speech Separation

Multichannel Variational Autoencoder-Based Speech Separation in Designated Speaker Order

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

Convolutional Maxout Neural Networks for Speech Separation

Multi-Microphone Speaker Separation by Spatial Regions

Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors