Transfer learning methods for low-resource speech accent recognition: A case study on Vietnamese language

Bao Thang Ta,Nhat Minh Le,Van Hai Do
DOI: https://doi.org/10.1016/j.engappai.2024.107895
IF: 8
2024-01-25
Engineering Applications of Artificial Intelligence
Abstract:Speech accent recognition (SAR) plays a crucial role in enhancing communication between customers and service providers, enabling personalized interactions based on geographical, birthplace, and cultural cues derived from accents. However, current approaches predominantly rely on training SAR models from scratch, overlooking the potential of transfer learning from other speech processing tasks, despite the relatively small size of accent datasets. This paper represents the first comprehensive investigation into the effectiveness of transfer learning methods derived from a diverse array of data-rich speech processing tasks for SAR. Through experiments on a practical Vietnamese telephone dataset provided by Viettel, the largest telecommunications provider in Southeast Asia, our study reveals that our best-performing model outperforms previous state-of-the-art SAR models by 46.7% in terms of relative accuracy.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?