Combining Eigenvoice Speaker Modeling And Vts-Based Environment Compensation For Robust Speech Recognition

Zhijian Ou,Kan Deng
DOI: https://doi.org/10.1109/ICASSP.2012.6288961
2012-01-01
ICASSP
Abstract:Eigenvoice and vector Taylor series (VTS) are good models for speaker differences and environmental variations separately. However, speaker and environmental variation always coexist in real-world speech. In this paper, we propose to combine eigenvoice and VTS. Specifically, we introduce eigenvoice speaker modeling for the clean speech into VTS's nonlinear mismatch function. In contrast, the standard VTS uses speaker-independent modeling to represent the clean speech, regardless of speaker differences. The eigenvoice coefficients and the noise model parameters are jointly estimated in the new approach. Experimental results on the Aurora2 task show the improved performances of combining eigenvoice and VTS and demonstrate its ability for speaker and noise factorization.
What problem does this paper attempt to address?