Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition

Zhengjia Dan,Yue Zhao,Xiaojun Bi,Licheng Wu,Qiang Ji
DOI: https://doi.org/10.1007/978-3-031-17120-8_9
2022-01-01
Abstract:In this paper, we apply multi-task learning to perform low-resource multi-dialect speech recognition, and propose a method combining Transformer and soft parameter sharing multi-task learning. Our model has two task streams: the primary task stream that recognizes speech and the auxiliary task stream identifies the dialect. The auxiliary task stream provides the dialect identification information to the auxiliary cross-attention of the primary task stream, so that the primary task stream has dialect discrimination. Experimental results on the task of Tibetan multi-dialect speech recognition show that our model outperforms the singledialect model and hard parameter sharing based multi-dialect model, by reducing the average syllable error rate (ASER) by 30.22% and 3.89%, respectively.
What problem does this paper attempt to address?