Abstract:Nonlinear compensation models make use of a nonlinear mismatch function, which characterizes the joint effects of additive and convolutional noise, to realize noise-robust speech recognition. Representative compensation models consist of vector Taylor series (VTS), data-driven parallel model combination (DPMC), and unscented transform (UT). The noise parameters of the compensation models, often estimated in the maximum likelihood (ML) sense, are known to play an important role on the system performance in noisy conditions. In this paper, we conduct a systematic comparison between two popular approaches for estimating the noise parameters. The first approach employs the Gauss-Newton method in a generalized EM framework to iteratively maximizing the EM auxiliary function. The second approach views the compensation models from a generative perspective, giving rise to an EM algorithm, analogous to the ML estimation for factor analysis (EM-FA). We demonstrate a close connection between these two approaches: they belong to the family of gradient-based methods except with different convergence rates. Note that the convergence property can be crucial to the noise estimation since model compensation may be frequently carried out in changing noisy environments for retaining desired performance. Furthermore, we present an in-depth discussion on the advantages and limitations of the two approaches, and illustrate how to extend these approaches to allow for adaptive training. The investigated noise estimation approaches are evaluated on several tasks. The first is to fit a GMM model to artificially corrupted samples, and then speech recognition are performed on the Aurora 2 and Aurora 4 tasks.

A VTS-based Feature Compensation Approach to Noisy Speech Recognition Using Mixture Models of Distortion

An Improved VTS Feature Compensation Using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition

Ivn-Based Joint Training of Gmm and Hmms Using an Improved Vts-Based Feature Compensation for Noisy Speech Recognition

A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition

Irrelevant Variability Normalization Based HMM Training Using VTS Approximation of an Explicit Model of Environmental Distortions.

A Feature Compensation Approach Using Piecewise Linear Approximation of an Explicit Distortion Model for Noisy Speech Recognition

Vts Feature Compensation Based on Two-Layer Gmm Structure for Robust Speech Recognition

Application of VTS Approximation Based Feature Compensation Approach to Speech Recognition

Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks

HMM compensation based on non-uniform spectral compression for noisy speech recognition

Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition

An HMM Compensatioon Approach for Dynamic Features Using Unscented Transformation and Its Application to Noisy Speech Recognition

VTS-based Robust Speech Recognition

Gaussian Specific Compensation for Channel Distortion in Speech Recognition

Two-stage model-based feature compensation for robust speech recognition

A Comparative Study of Noise Estimation Algorithms for Nonlinear Compensation in Robust Speech Recognition

Feature Compensation Algorithm Based on Vector Taylor Series for Speaker Recognition

Robust Speech Recognition Based on Vector Taylor Series

A Speech Enhancement Approach Using Piecewise Linear Approximation of an Explicit Model of Environmental Distortions

Intersession Variability Compensation for Language Detection

Voice Conversion Based on Gaussian Mixture Modules with Minimum Distance Spectral Mapping