Abstract:Speech signals recorded by distant microphones are often contaminated with room reverberation and signals of interfering speakers. This article addresses the problem of joint source separation and dereverberation using multichannel nonnegative tensor factorization (NTF) in which late reverberant components are modeled using the so-called delayed subsources. The article formulates two distinct signal models of the time-frequency spectrum of the multichannel microphone mixture, in which reverberation is modeled either independently for each source using delayed source variances or jointly using delayed microphone signals. In addition, it defines computationally efficient variants of these two methods with a simplified spatial model in which spatial properties of the late reverberant components are estimated jointly for all delays. For each of the four distinct algorithms, the article first formulates a maximum a posteriori (MaP) estimator based on the NTF model with the localization prior over the mixing matrix that is suitable for the estimation of the early reverberation (primarily the direct-path) signals in a reverberant environment. Next it derives update equations for the four resulting expectation-maximization algorithms, which are thoroughly evaluated and shown to outperform similar state-of-the-art approaches. The results of experimental evaluations, performed using real and simulated data, for determined, over-determined and under-determined scenarios, indicate superior performance of the proposed processing over state-of-the-art in terms of standard source separation and dereverberation metrics.

Neural Network Alternatives to Convolutive Audio Models for Source Separation

A Neural Network Alternative to Non-Negative Audio Models

End-to-end Non-Negative Autoencoders for Sound Source Separation

End-to-end Networks for Supervised Single-channel Speech Separation

Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

A variance modeling framework based on variational autoencoders for speech enhancement

Deep NMF for speech separation

Multichannel blind speech source separation with a disjoint constraint source model

Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

Neural Network Approaches to Nonlinear Blind Source Separation.

Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

Neural-Based Separating Method for Nonlinear Mixtures

Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors

Sound Source Separation Using Latent Variational Block-Wise Disentanglement

U-NET: A Supervised Approach for Monaural Source Separation

Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

Determined Multichannel Blind Source Separation with Clustered Source Model

Expectation-maximisation for Speech Source Separation Using Convolutive Transfer Function