Abstract:This paper addresses the problem of multiple-speaker localization in noisy and reverberant environments, using binaural recordings of an acoustic scene. A complex-valued Gaussian mixture model (CGMM) is adopted, whose components correspond to all the possible candidate source locations defined on a grid. After optimizing the CGMM-based objective function, given an observed set of complex-valued binaural features, both the number of sources and their locations are estimated by selecting the CGMM components with the largest weights. An entropy-based penalty term is added to the likelihood to impose sparsity over the set of CGMM component weights. This favors a small number of detected speakers with respect to the large number of initial candidate source locations. In addition, the direct-path relative transfer function (DP-RTF) is used to build robust binaural features. The DP-RTF, recently proposed for single-source localization, encodes interchannel information corresponding to the direct path of sound propagation and is thus robust to reverberations. In this paper, we extend the DP-RTF estimation to the case of multiple sources. In the short-time Fourier transform domain, a consistency test is proposed to check whether a set of consecutive frames is associated with the same source or not. Reliable DP-RTF features are selected from the frames that pass the consistency test to be used for source localization. Experiments carried out using both simulation data and real data recorded with a robotic head confirm the efficiency of the proposed multisource localization method.

Two-Microphones Speech Separation Using Generalized Gaussian Mixture Model

Dual-Channel Speech Separation Using Interaural Time Difference with Generalized Gaussian Mixture Model

Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model

Adaptive Beamforming Based on Interference-Plus-Noise Covariance Matrix Reconstruction for Speech Separation

Dual-Channel Speech Separation by Sub-Segmental Directional Statistics

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

A Two Microphone-Based Approach For Source Localization Of Multiple Speech Sources

An MRF-ICA based algorithm for image separation

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

Dual-Channel Cosine Function Based Itd Estimation For Robust Speech Separation

PGSS: Pitch-Guided Speech Separation.

Source Separation by Feature-Based Clustering of Microphones in Ad Hoc Arrays

Binaural Angular Separation Network

Estimation for the Location of Multiple Moving Sound Sources in Small-Distance Dual-Microphone

Two microphone based direction of arrival estimation for multiple speech sources using spectral properties of speech

Speaker Identification based on LSP and Gaussian Mixture Model

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.

Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization.

Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework

Multi-Microphone Speaker Separation by Spatial Regions

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition