Abstract:In conventional microphone array speech recognition, the array processor and the speech recognizer are loosely coupled. The only connection between the two modules is the en hanced target signal output from the array processor, which then gets treated as a single input to. the recognizer. In this approach, useful environmental information, which can be provided by the array processor and also needs to be exploited by the recognizer, is ignored. Inherently, the array processor can generate multiple outputs of spatially filtered signals, as a multi-input-multi-output (MIMO) module. In this paper, a closely coupled approach is proposed, in which a recognizer with model-based noise compensation exploits the reference noise outputs from a MIMO array processor. Specifically, a multichannel model-based noise compensation is presented, including the compensation procedure using the vector Taylor series (VTS) expansion and parameter estimation using the expectation-maximization (EM) algorithm. It is also shown how to construct MIMO array processors from conventional beamformers. A number of practical implementations of the conventional loosely coupled approach and the proposed closely coupled approach were tested on a publicly available database, the Multichannel Overlapping Number Corpus (MONC). Experimental results showed that the proposed closely coupled approach significantly improved the speech recognition performance in the overlapping speech situations.

Combining Eigenvoice Speaker Modeling And Vts-Based Environment Compensation For Robust Speech Recognition

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

Maximum Likelihood I-Vector Space Using PCA for Speaker Verification.

Cross-modal Mask Fusion and Modality-Balanced Audio-Visual Speech Recognition

An Improved VTS Feature Compensation Using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

Using vector taylor series with noise clustering for speech recognition in non-stationary noisy environnlents

VTS-based Robust Speech Recognition

A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition

Combining Noise Compensation and Missing-Feature Decoding for Large Vocabulary Speech Recognition in Noise

Application of VTS Approximation Based Feature Compensation Approach to Speech Recognition

A VTS-based Feature Compensation Approach to Noisy Speech Recognition Using Mixture Models of Distortion

Ivn-Based Joint Training of Gmm and Hmms Using an Improved Vts-Based Feature Compensation for Noisy Speech Recognition

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech

Residual Noise Compensation For Robust Speech Recognition In Nonstationary Noise

Enhancing CTC-based speech recognition with diverse modeling units

Eigenspace Estimation With Missing Values And Its Application To Eigenvoice Adaptation For Speech Recognition

Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition

Phonetic-aware speaker embedding for far-field speaker verification

An Algorithm of Model Compensation Based on the Estimation of Additive Noise and Channel Function for Speech Recognition