Robust Front-End for Speech Recognition Based on Computational Auditory Scene Analysis and Speaker Model

Yong Guan,Peng Li,Wenju Liu,Bo Xu
2009-01-01
Abstract:Conventional noise robust speech recognition system does not work well when human speech is presented in the background. In this paper, a computational auditory scene analysis (CASA) and speaker model based speech segregation system is proposed to solve this problem. By utilizing speaker model and factorial-max vector quantization (MAXVQ) to estimate real-value masks in CASA framework, a robust front-end for speech recognition is constructed. Evaluations on speech separation challenge (SSC) showed that the proposed system won 15.68% improvement over the baseline system. The results of evaluation also proved the validity of the multi-speaker recognition and the real-value mask estimation module.
What problem does this paper attempt to address?