Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition.

Zhiyuan Tang,Lantian Li,Dong Wang,Ravichander Vipperla
DOI: https://doi.org/10.1109/TASLP.2016.2639323
2017-01-01
Abstract:Automatic speech and speaker recognition are traditionally treated as two independent tasks and are studied separately. The human brain in contrast deciphers the linguistic content, and the speaker traits from the speech in a collaborative manner. This key observation motivates the work presented in this paper. A collaborative joint training approach based on multitask recurrent neural network models is proposed, where the output of one task is backpropagated to the other tasks. This is a general framework for learning collaborative tasks and fits well with the goal of joint learning of automatic speech and speaker recognition. Through a comprehensive study, it is shown that the multitask recurrent neural net models deliver improved performance on both automatic speech and speaker recognition tasks as compared to single-task systems. The strength of such multitask collaborative learning is analyzed, and the impact of various training configurations is investigated.
What problem does this paper attempt to address?