Statistical Thresholding for Robust ASR

LI Yin-guo,PU Fu-an,Thomas Fang ZHENG
DOI: https://doi.org/10.3979/j.issn.1673-825x.2012.02.001
2012-01-01
Abstract:Speech recognition systems have been applied in real world applications for several decades,where there should be an unsatisfactory recognition performance under various noise conditions,particularly in lower signal-to-noise ratio(SNR) circumstances.In this paper,we propose a statistical thresholding method for mean and variance normalization technique,further reducing the mismatch between training and testing environments,which makes an automatic speech recognition system more robust to environmental changes.Mel-frequency cepstrum coefficient(MFCC) features are extracted as acoustic features,and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization(CMVN) features.The proposed statistical thresholding method is then applied.The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels.In an isolated word recognition task,the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end,and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction(CMS) and CMVN.
What problem does this paper attempt to address?