Abstract:Speech recognition systems have been applied in real world applications for several decades,where there should be an unsatisfactory recognition performance under various noise conditions,particularly in lower signal-to-noise ratio(SNR) circumstances.In this paper,we propose a statistical thresholding method for mean and variance normalization technique,further reducing the mismatch between training and testing environments,which makes an automatic speech recognition system more robust to environmental changes.Mel-frequency cepstrum coefficient(MFCC) features are extracted as acoustic features,and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization(CMVN) features.The proposed statistical thresholding method is then applied.The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels.In an isolated word recognition task,the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end,and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction(CMS) and CMVN.

SPONTANEOUS ORAL SPEAKING AUDIO SEGMENTATION ALGORITHM BASED ON ADAPTIVE THRESHOLD AND PITCH DETECTION

Design and Implementation of End-Point Detection Accelerator for Speech Recognition

De￣Noising Method of the EEG Based on Adaptive Threshold

A pitch-based rapid speech segmentation for speaker indexing

Effective Speech Endpoint Detection Algorithm For Voiceprint Recognition

Automatic spoken English test for Chinese learners

Statistical Thresholding for Robust ASR

Exponential Threshold Based Speech Endpoint Detection Method

Research of adaptive speech separation method based on speech status detection

Threshold-Based Noise Detection and Reduction for Automatic Speech Recognition System in Human-Robot Interactions

Assessing Segmental Impact for Objective Speech Quality Evaluation.

Using an Adjustment Training and a Smoothing Mask for Speech Segregation

Speech Enhancement Based On Analysis Synthesis Framework With Improved Pitch Estimation And Spectral Envelope Enhancement

A Power Spectrum Reprocessing Algorithm for Pitch Detection of Speech

Improved Voice Activity Detection Based on Long-term Spectral Divergence and Pitch Ratio Features

Robust speech recognition in noisy backgrounds based on Teager energy operator and auditory process

Efficient Identification Of Speakers In News Video Based On Shot Segmentation

A Two-Stage Content-Based Audio Segmentation Algorithm

Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms

A Pitch Period Detection Algorithm Using Time and Frequency Analyses

Speech Enhancement Based on Short-Time Spectral Amplitude Estimates in Low SNR