Knowledge Transfer in Permutation Invariant Training for Single-Channel Multi-Talker Speech Recognition.

Tian Tan,Yanmin Qian,Dong Yu
DOI: https://doi.org/10.1109/icassp.2018.8461883
2018-01-01
Abstract:This paper proposes a framework that combines teacher-student training and permutation invariant training (PIT) for single-channel multi-talker speech recognition. In contrast to most of conventional teacher-student training methods that aim at compressing the model, the proposed method distills knowledge from the single-talker model to improve the multi-talker model in the PIT framework. The inputs to the teacher and student networks are the single-talker clean speech and the multi-talker mixed speech, respectively. The knowledge is transferred to the student through the soft labels generated by the teacher. Furthermore, the ensemble of multiple teachers is exploited with a progressive training scheme to further improve the system. In this framework it is easy to take advantage of data augmentation and perform domain adaptation for multi-talker speech recognition using only untranscribed data. The proposed techniques were evaluated on artificially mixed two-talker AMI speech data. The experimental results show that the teacher-student training can cut the word error rate (WER) by relative 20% against the baseline PIT model. We also evaluated our unsupervised domain adaptation method on an artificially mixed WSJO corpus and achieved relative 30% WER reduction against the AMI PIT model.
What problem does this paper attempt to address?