Abstract:Target speaker extraction (TSE) has become an attractive research topic in recent years. However, TSE under the underdetermined conditions is still a challenge. In this paper, we deal with a dual-channel TSE problem under underdetermined conditions. Geometric source separation (GSS) is used to be a solution to the TSE problem, but the performance of conventional GSS methods is limited under underdetermined conditions because of the lack of a powerful source model. We propose a dual-channel TSE method with the combined capabilities of target selection based on geometric constraints, more powerful source modeling, and nonlinear postprocessing. A geometric constraint (GC) on the target direction of arrival (DOA) is applied to select the target, and two conditional variational autoencoders (CVAEs) are used to model a single speaker's speech and interference mixture speech. For postprocessing, an ideal ratio timefrequency (TF) mask estimated from the separated interference mixture speech is used to extract the target speaker's speech. Moreover, to overcome the impact of DOA estimation errors, we improve the objective function so that the target DOA information can be modified. The experimental results demonstrate that the proposed method achieves 6.24 dB and 8.37 dB improvements compared with the baseline method in terms of signal-to-distortion ratio (SDR) and source-to-interference ratio (SIR), respectively, under medium reverberation for 470 ms. Furthermore, through the analysis of experimental results, we found that the improvement method is robust against DOA estimation errors.

Dual-Channel Speech Separation Using Interaural Time Difference with Generalized Gaussian Mixture Model

Two-Microphones Speech Separation Using Generalized Gaussian Mixture Model

Dual-Channel Speech Separation by Sub-Segmental Directional Statistics

Dual-Channel Cosine Function Based Itd Estimation For Robust Speech Separation

Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model

Adaptive Beamforming Based on Interference-Plus-Noise Covariance Matrix Reconstruction for Speech Separation

Speaker and Direction Inferred Dual-channel Speech Separation

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

Using Energy Difference for Speech Separation of Dual-microphone Close-talk System

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

A Two Microphone-Based Approach For Source Localization Of Multiple Speech Sources

A Two-Stage Approach for the Estimation of Doubly Spread Acoustic Channels

Two microphone based direction of arrival estimation for multiple speech sources using spectral properties of speech

A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions

Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information

PGSS: Pitch-Guided Speech Separation.

A Multi-channel Speech Separation System for Unknown Number of Multiple Speakers

Speaker Identification based on LSP and Gaussian Mixture Model

A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks

Binaural Angular Separation Network

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.