Abstract:Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address this problem. While the former often gives better performance, the latter involves estimation of lesser number of parameters, making the system feasible for practical implementations. This research focuses on the efficacies of various subspace, statistical and stereo based feature normalisation techniques. A subspace projection based method has been investigated as a standalone and adjunct technique involving reconstruction of noisy speech features from a precomputed set of clean speech building-blocks. The building blocks are learned using non-negative matrix factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for the clean speech subspace. The work provides a detailed study on how the method can be incorporated into the extraction process of Mel-frequency cepstral coefficients. Experimental results show that the new features are robust to noise, and achieve better results when combined with the existing techniques. The work also proposes a modification to the training process of SPLICE algorithm for noise robust speech recognition. It is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. An MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed.

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Bionic Cepstral coefficients (BCC): A new auditory feature extraction to noise-robust speaker identification

Modified MFCCs for Robust Speaker Recognition

Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models

A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

ELM speaker identification for limited dataset using multitaper based MFCC and PNCC features with fusion score

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Using MCE Algorithm to Improve the Performance of Speaker Recognition

Robust speaker recognition using glottal information‐based cepstral mean subtraction

Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition

A Speech Enhancement Algorithm Based on Computational Auditory Scene Analysis

Enhancing speaker identification through reverberation modeling and cancelable techniques using ANNs

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Speaker Recognition Using DMFCC over Telephone Channels

Robust Feature Extraction Using Temporal Context Averaging for Speaker Identification in Diverse Acoustic Environments

Auditory model-based speech feature extraction and its application to speaker identification

Using Subband Mel-spectrum Centroid and Gaussian Mixture Correlation for Robust Speaker Identification

Noise Robust Speaker Recognition Based on Adaptive Frame Weighting in GMM for i-Vector Extraction.

Auditory Model Based Speech Feature Extraction and Its Application to Speaker Identification

Feature Normalisation for Robust Speech Recognition