Abstract:Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address this problem. While the former often gives better performance, the latter involves estimation of lesser number of parameters, making the system feasible for practical implementations. This research focuses on the efficacies of various subspace, statistical and stereo based feature normalisation techniques. A subspace projection based method has been investigated as a standalone and adjunct technique involving reconstruction of noisy speech features from a precomputed set of clean speech building-blocks. The building blocks are learned using non-negative matrix factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for the clean speech subspace. The work provides a detailed study on how the method can be incorporated into the extraction process of Mel-frequency cepstral coefficients. Experimental results show that the new features are robust to noise, and achieve better results when combined with the existing techniques. The work also proposes a modification to the training process of SPLICE algorithm for noise robust speech recognition. It is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. An MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed.

Statistical Thresholding for Robust ASR

Cross-modal Mask Fusion and Modality-Balanced Audio-Visual Speech Recognition

Compensation of Speech Enhancement Distortion for Robust Speech Recognition

Threshold-Based Noise Detection and Reduction for Automatic Speech Recognition System in Human-Robot Interactions

Feature Normalisation for Robust Speech Recognition

Cepstral Shape Normalization (CSN) for Robust Speech Recognition

VTS-based Robust Speech Recognition

Modified MFCCs for Robust Speaker Recognition

Flooring the observation probability for robust ASR in impulsive noise

Robust speaker recognition using glottal information‐based cepstral mean subtraction

Robust tri-modal automatic speech recognition for consumer applications.

An Efficient Robust Asr System Based On The Combination Of Speech Enhancement And Hmm Adaptation

A Noise Robust Front End Algorithm for Mandarin Speech Recognition and Performance Analysis

Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

Speech Enhancement Based on Short-Time Spectral Amplitude Estimates in Low SNR

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Robust Speech Recognition by Selecting Mel-Filter Banks

Adverse Conditions and ASR Techniques for Robust Speech User Interface

Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

Modelling human speech recognition in challenging noise maskers using machine learning

Robust Audio-Visual Mandarin Speech Recognition Based on Adaptive Decision Fusion and Tone Features