Abstract:Channel distortion is one of the major factors which degrade the performances of automatic speech recognition (ASR) systems. Current compensation methods are generally based on the assumption that the channel distortion is a constant or slowly varying bias in an utterance or globally. However, this assumption is not sustained in a more complex circumstance, when the speech records being recognized are from many different unknown channels and have parts of the spectrum completely removed (e.g. band-limited speech). On the one hand, different channels may cause different distortions; on the other, the distortion caused by a given channel varies over the speech frames when parts of the speech spectrum are removed completely. As a result, the performance of the current methods is limited in complex environments. To solve this problem, we propose a unified framework in which the channel distortion is first divided into two subproblems, namely, spectrum missing and magnitude changing. Next, the two types of distortions are compensated with different techniques in two steps. In the first step, the speech bandwidth is detected for each utterance and the acoustic models are synthesized with clean models to compensate for spectrum missing. In the second step, the constant term of the distortion is estimated via the expectation-maximization (EM) algorithm and subtracted from the means of the synthesized model to further compensate for magnitude changing. Several databases are chosen to evaluate the proposed framework. The speech in these databases is recorded in different channels, including various microphones and band-limited channels. Moreover, to simulate more types of spectrum missing, various low-pass and band-pass filters are used to process the speech from the chosen databases. Although these databases and their filtered versions make the channel conditions more challenging for recognition, experimental results show that the proposed framework can substantially improve the performance of ASR systems in complex channel environments.

An Overview of Compensation Methods for Environment Mismatch in Speech Recognition

Research on Bandwidth Mismatch Compensation in Speech Recognition

Simplified Deformation Compensation for Emotional Speaker Recognition

An Algorithm of Model Compensation Based on the Estimation of Additive Noise and Channel Function for Speech Recognition

Toward emotional speaker recognition: framework and preliminary results

A Comparative Study of Noise Estimation Algorithms for Nonlinear Compensation in Robust Speech Recognition

Speech Recognition Algorithm in Complex Noisy Environments Based on Multi-Space Compensation

Unified adaptation approach for robust speech recognition

A New Framework for Robust Speech Recognition in Complex Channel Environments

Compensation of Speech Enhancement Distortion for Robust Speech Recognition

Model Compensation Approach Based on Nonuniform Spectral Compression Features for Noisy Speech Recognition

Gaussian Specific Compensation for Channel Distortion in Speech Recognition

Combining Log-Spectral Domain Compensation with MVA Feature Post-Processing for Robust Speech Recognition

An Overview of Speech Feature Enhancement Method

A Robust Speaker Recognition Approach Based On Model Compensation

Effect of Environmental Parameters Mismatch to Matched Field Processing in Shallow Water

A Principle Solution for Enroll-Test Mismatch in Speaker Recognition

Joint compensation of noise and channel in speech recognition

A Feature Compensation Approach Using Piecewise Linear Approximation of an Explicit Distortion Model for Noisy Speech Recognition

An Approach To Robust Speaker Recognition Using Stochastic Matching

An Environment Adaptation Method for Robust Speech Recognition