Abstract:In our previous work, we proposed a feature compensation approach using high-order vector Taylor series (VTS) approximation for noisy speech recognition. In this paper, we report new progress on making it more powerful and practical in real applications. First, mixtures of densities are used to enhance the distortion models of both additive noise and convolutional distortion. New formulations for maximum likelihood (ML) estimation of distortion model parameters, and minimum mean squared error (MMSE) estimation of clean speech are derived and presented. Second, we improve the feature compensation in both efficiency and accuracy by applying higher order information of VTS approximation only to the noisy speech mean parameters, and a temporal smoothing operation for the posterior probability of Gaussian mixture components in clean speech estimation. Finally, we design a procedure to perform irrelevant variability normalization (IVN) based joint training of a reference Gaussian mixture model (GMM) for feature compensation and hidden Markov models (HMMs) for acoustic modeling using VTS-based feature compensation. The effectiveness of our proposed approach is confirmed by experiments on Aurora3 benchmark database for a real-world in-vehicle connected digits recognition task. Compared with ETSI advanced front-end, our approach achieves significant recognition accuracy improvement across three “training-testing” conditions for four languages.

The Improved VQ Algorithm for Speaker Recognition

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave.

Applying Support Vector Machines to Voice Activity Detection

Maximum Likelihood I-Vector Space Using PCA for Speaker Verification.

Design and implementation of a speaker recognition system

Speaker Identification Using the VQ-Based Discriminative Kernels

A New Similarity Measure Of Random Variables And Its Application To Vq For Speech Recognition

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

An Improved VTS Feature Compensation Using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

Improved speech recognition algorithm based on MFCC feature

VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation.

Speech recognition based on relevance vector machine

Improved HMM Model Using Spatial Correlation

Improved Algorithm for Speaker Verification

A Novel I-Vector Framework Using Multiple Features and PCA for Speaker Recognition in Short Speech Condition

Exploring Universal Speech Attributes for Speaker Verification with an Improved Cross-stitch Network

An algorithm for efficient speaker identification using reference speaker model based two-layer structure

Classifier Selection in Speaker Verification Technology

Mixture of Support Vector Machines for Text-Independent Speaker Recognition

Ensemble of Support Vector Machine for Text-Independent Speaker Recognition

Noise Robust Speaker Recognition Based on Adaptive Frame Weighting in GMM for i-Vector Extraction.