Abstract:In our previous work, we proposed a feature compensation approach using high-order vector Taylor series (VTS) approximation for noisy speech recognition. In this paper, we report new progress on making it more powerful and practical in real applications. First, mixtures of densities are used to enhance the distortion models of both additive noise and convolutional distortion. New formulations for maximum likelihood (ML) estimation of distortion model parameters, and minimum mean squared error (MMSE) estimation of clean speech are derived and presented. Second, we improve the feature compensation in both efficiency and accuracy by applying higher order information of VTS approximation only to the noisy speech mean parameters, and a temporal smoothing operation for the posterior probability of Gaussian mixture components in clean speech estimation. Finally, we design a procedure to perform irrelevant variability normalization (IVN) based joint training of a reference Gaussian mixture model (GMM) for feature compensation and hidden Markov models (HMMs) for acoustic modeling using VTS-based feature compensation. The effectiveness of our proposed approach is confirmed by experiments on Aurora3 benchmark database for a real-world in-vehicle connected digits recognition task. Compared with ETSI advanced front-end, our approach achieves significant recognition accuracy improvement across three “training-testing” conditions for four languages.

The Study of Vocal Tract Length Normalization Based on Single Mixture in Noisy Environment

Speaker normalization and adaptation techniques in automatic pronunciation evaluation

Auditory Features with Vocal Track Length Normalization for Language Identification

SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM

Attentive batch normalization for lstm-based acoustic modeling of speech recognition

Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition.

A VTS-based Feature Compensation Approach to Noisy Speech Recognition Using Mixture Models of Distortion

Toward On-Line Learning of Chinese Continuous Speech Recognition System.

Adaptive Speaker Normalization for CTC-Based Speech Recognition

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

A study on speech feature extraction and application in mandarin LVCSR

Vocal Tract Normalization in Articulatory Space Using Thin-Plate Spline Method

Speaker Normalization Training and Adaptation for Speech Recognition

An Improved VTS Feature Compensation Using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

Morphological normalization of vocal tract shape

Analysis of Length Normalization in End-to-End Speaker Verification System

Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition

Using Data Augmentations and VTLN to Reduce Bias in Dutch End-to-End Speech Recognition Systems

A new score normalizaion algorithm based on EMD-Tnorm for speaker verification

Improving RNN transducer with normalized jointer network

Comparison of Non-native Speaker Adaptations for Large Vocabulary Continuous Mandarin Speech Recognition