Abstract:Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time frequency (T-F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation-maximization (EM) algorithm. A T-F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T-F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T-F mask based approaches and one convolutive independent component analysis (ICA) based method. (C) 2015 Elsevier B.V. All rights reserved.

Cepstral smoothing of masks for single-channel speech segregation

Cepstral Smoothing of Spectral Masks for Acoustic Vector-Sensor Based Convolutive Speech Separation

Using an Adjustment Training and a Smoothing Mask for Speech Segregation

Speech Enhancement Algorithm Based on Auditory Masking Effect and Optimal Smoothing

A Dual Microphone Speech Enhancement Method With A Smoothing Parameter Mask

Speech enhancement based on improved spectral subtraction algorithm

A Speech Enhancement Algorithm Using Computational Auditory Scene Analysis with Spectral Subtraction

Reverberant Speech Separation with Probabilistic Time-Frequency Masking for B-format Recordings.

Incorporation of a modified temporal cepstrum smoothing in both signal-to-noise ratio and speech presence probability estimation for speech enhancement

Speech Enhancement for Nonstationary Noise Environments

Improved minima controlled recursive averaging algorithm based on improved spectrum smoothing strategy and speech enhancement

Multi-resolution Auditory Cepstral Coefficient and Adaptive Mask for Speech Enhancement with Deep Neural Network

Noise Reduction in Whisper Speech Based on the Auditory Masking Model

CASA Based Speech Separation for

CASA Based Speech Separation for Robust Speech Recognition

A Modified Spectral Subtraction Method For Speech Enhancement Based On Masking Property Of Human Auditory System

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

Speech Enhancement Based on Modified a Priori SNR Estimation

Speech enhancement algorithm based on noise estimation of binary masking

Speech Enhancement Based on Masking Properties and Short-Time Spectral Amplitude Estimation