Abstract:Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address this problem. While the former often gives better performance, the latter involves estimation of lesser number of parameters, making the system feasible for practical implementations. This research focuses on the efficacies of various subspace, statistical and stereo based feature normalisation techniques. A subspace projection based method has been investigated as a standalone and adjunct technique involving reconstruction of noisy speech features from a precomputed set of clean speech building-blocks. The building blocks are learned using non-negative matrix factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for the clean speech subspace. The work provides a detailed study on how the method can be incorporated into the extraction process of Mel-frequency cepstral coefficients. Experimental results show that the new features are robust to noise, and achieve better results when combined with the existing techniques. The work also proposes a modification to the training process of SPLICE algorithm for noise robust speech recognition. It is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. An MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed.

SPEAKER NORMALIZATION AND NOVEL ROBUST SPEECH FEATURE BASED ON MELLIN TRANSFORM

A Novel Robust Feature Of Speech Signal Based On The Mellin Transform For Speaker-Independent Speech Recognition

A Novel Speaker Normalization Method Based on Formant Recovery and Mellin Transform

Speaker Normalization Based on the Generalized Time-Frequency Representation and Mellin Transform

A new speech feature insensitive to the variation of different speakers

Affect-Insensitive Speaker Recognition by Feature Variety Training

Speaker Normalization Training and Adaptation for Speech Recognition

Speaker normalization and adaptation techniques in automatic pronunciation evaluation

Feature Normalisation for Robust Speech Recognition

The Study of Vocal Tract Length Normalization Based on Single Mixture in Noisy Environment

Discussion On Score Normalization And Language Robustness In Text-Independent Multi-Language Speaker Verification

A new score normalizaion algorithm based on EMD-Tnorm for speaker verification

A VTS-based Feature Compensation Approach to Noisy Speech Recognition Using Mixture Models of Distortion

Deep Speaker Vector Normalization with Maximum Gaussianality Training

Novel Non-parametric Model for Robust Speaker Recognition

Deep Normalization for Speaker Vectors

Discriminatively Trained Joint Speaker and Environment Representations for Adaptation of Deep Neural Network Acoustic Models

Speaker Recognition Based on Weighted Features Compensation Transformation and Its Simulation Study

A Robust Feature Normalization Algorithm for Automatic Speech Recognition

A New Method of Score Normalization for Text-Independent Speaker Verification

Score Normalization-Based Speaking-Style Variation Robust Speaker Recognition