End-to-End-Based Tibetan Multitask Speech Recognition.

Yue Zhao,Jianjian Yue,Xiaona Xu,Licheng Wu,Xiali Li
DOI: https://doi.org/10.1109/access.2019.2952406
IF: 3.9
2019-01-01
IEEE Access
Abstract:To date, speech recognition technology for majority languages has been applied in wireless communication devices successfully. However, as a minority language, Tibetan has very limited resources for conventional automatic speech recognition. It lacks of enough data, sub-word units, lexicons, and word inventories for some dialects. In this paper, we present a multitask end-to-end model to perform simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition. This model avoids processing the pronunciation dictionary and word segmentation for new dialects while allowing for training three tasks in a single model. We build the multitask recognition framework based on WaveNet-CTC. The dialect information and speaker ID are used in the output for training. The experimental results show that our method has better performance compared with a task-specific model.
What problem does this paper attempt to address?