VTS-based Robust Speech Recognition

ZHAO Xianyu,Ou Zhijian,WANG Zuoying
DOI: https://doi.org/10.3321/j.issn:1000-0054.2005.07.008
2005-01-01
Abstract:In order to further improve noise modeling accuracy and acoustic model compensation, this paper presents an unsupervised clustering technique combined with vector Taylor series (VTS) expansions. This method clusters noise speech frames into different classes based on the Kullback-Leibler distance between noise models. Separate VTS expansions are applied to each class for noise models' parameter estimation and acoustic model compensation. Experiments with a digit string recognizer with babble and Gaussian white noise environments gave 27.7% and 17.8% error reduction relative to a conventional VTS algorithm. These results show that the combination of unsupervised noise clustering and VTS expansions for each class can approximate the non-linear speech and noise corruption model in the cepstral domain more effectively.
What problem does this paper attempt to address?