Abstract:Previously, a dereverberation method based on generalized spectral subtraction (GSS) using multi-channel least mean-squares (MCLMS) has been proposed. The results of speech recognition experiments showed that this method achieved a significant improvement over conventional methods. In this paper, we apply this method to distant-talking (far-field) speaker recognition. However, for far-field speech, the GSS-based dereverberation method using clean speech models degrades the speaker recognition performance. This may be because GSS-based dereverberation causes some distortion between clean speech and dereverberant speech. In this paper, we address this problem by training speaker models using dereverberant speech obtained by suppressing reverberation from arbitrary artificial reverberant speech. Furthermore, we propose an efficient computational method for a combination of the likelihood of dereverberant speech using multiple compensation parameter sets. This addresses the problem of determining optimal compensation parameters for GSS. We report the results of a speaker recognition experiment performed on large-scale far-field speech with different reverberant environments to the training environments. The proposed GSS-based dereverberation method achieves a recognition rate of 92.2%, which compares well with conventional cepstral mean normalization with delay-and-sum beamforming using a clean speech model (49.0%) and a reverberant speech model (88.4%). We also compare the proposed method with another dereverberation technique, multi-step linear prediction-based spectral subtraction (MSLP-GSS). The proposed method achieves a better recognition rate than the 90.6% of MSLP-GSS. The use of multiple compensation parameters further improves the speech recognition performance, giving our approach a recognition rate of 93.6%. We implement this method in a real environment using the optimal compensation parameters estimated from an artificial environment. The results show a recognition rate of 87.8% compared with 72.5% for delay-and-sum beamforming using a reverberant speech model.

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

A Robust Super-Resolution Approach with Sparsity Constraint in Acoustic Imaging

Speech Dereverberation Based on Sparse Matrix Decomposition

Multi-Microphone Speaker Separation by Spatial Regions

Distributed speech separation in spatially unconstrained microphone arrays

Joint Channel Estimation and Data Recovery of Communication Systems with Sub-Nyquist Receiver.

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios

A modeling and algorithmic framework for (non)social (co)sparse audio restoration

Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Distant-talking Speaker Identification by Generalized Spectral Subtraction-Based Dereverberation and Its Efficient Computation

Mixture Encoder for Joint Speech Separation and Recognition

Hidden Markov Acoustic Modeling with Bootstrap and Restructuring for Low-Resourced Languages

Joint sparse representation based cepstral-domain dereverberation for distant-talking speech recognition

A two-stage speaker extraction algorithm under adverse acoustic conditions using a single-microphone

Dereverberantion Based on Generalized Spectral Subtraction for Distant-Talking Speaker Recognition

Dereverberation using joint estimation of dry speech signal and acoustic system

Dereverberation for Speaker Identification in Meeting

Modeling of reverberant room responses for two-dimensional spatial sound field analysis and synthesis.

Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization.

Sparse Modeling of The Early Part of Noisy Room Impulse Responses with Sparse Bayesian Learning