Abstract:Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.

Cochleagram-Based Audio Pattern Separation Using Two-Dimensional Non-Negative Matrix Factorization With Automatic Sparsity Adaptation

Non-Negative Matrix Factorization with Sparsity Learning for Single Channel Audio Source Separation

Unsupervised Single-Channel Separation of Nonstationary Signals Using Gammatone Filterbank and Itakura–Saito Nonnegative Matrix Two-Dimensional Factorizations

Single-Channel Signal Separation Using Spectral Basis Correlation with Sparse Nonnegative Tensor Factorization

Monaural Singing Voice Separation By Non-Negative Matrix Partial Co-Factorization With Temporal Continuity And Sparsity Criteria

Single channel informed signal separation using artificial‐stereophonic mixtures and exemplar‐guided matrix factor deconvolution

Single-channel blind separation using L₁-sparse complex non-negative matrix factorization for acoustic signals.

Reverberant Speech Separation with Probabilistic Time-Frequency Masking for B-format Recordings.

Unsupervised Learning For Monaural Source Separation Using Maximization-Minimization Algorithm With Time-Frequency Deconvolution

Multi-channel sound source separation based on time-frequency sparsity constraint

Speech Separation Via Parallel Factor Analysis Of Cross-Frequency Covariance Tensor

An Adaptive Single Channel EMD-TNMF Blind Source Separation Algorithm for Both Instantaneous and Convolutive Mixed Signal

Deep Neural Network Based Audio Source Separation

Single-Channel Speech Separation Based on Non-negative Matrix Factorization and Factorial Conditional Random Field

Reverberant Signal Separation Using Optimized Complex Sparse Nonnegative Tensor Deconvolution on Spectral Covariance Matrix

Single Channel Audio Source Separation

Single Channel Source Separation Using Filterbank and 2D Sparse Matrix Factorization

Single-channel Speech Separation with Non-Negative Matrix Factorization and Factorial Conditional Random Fields

Machine Learning Source Separation Using Maximum a Posteriori Nonnegative Matrix Factorization.

Single-Channel Source Separation Using EMD-Subband Variable Regularized Sparse Features

Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification