Unsupervised Regularization-Based Adaptive Training for Speech Recognition

Fenglin Ding,Wu Guo,Bin Gu,Zhen-Hua Ling,Jun Du
DOI: https://doi.org/10.21437/interspeech.2020-1689
2020-01-01
Abstract:In this paper, we propose two novel regularization-based speaker adaptive training approaches for connectionist temporal classification (CTC) based speech recognition. The first method is center loss (CL) regularization, which is used to penalize the distances between the embeddings of different speakers and the only center. The second method is speaker variance loss (SVL) regularization in which we directly minimize the speaker interclass variance during model training. Both methods achieve the purpose of training an adaptive model on the fly by adding regularization terms to the training loss function. Our experiment on the AISHELL-1 Mandarin recognition task shows that both methods are effective at adapting the CTC model without requiring any specific fine-tuning or additional complexity, achieving character error rate improvements of up to 8.1% and 8.6% over the speaker independent (SI) model, respectively.
What problem does this paper attempt to address?