A reweighting method for speech recognition with imbalanced data of Mandarin and sub-dialects
Jiaju Wu,Zhengchang Wen,Haitian Huang,Hanjing Su,Fei Liu,Huan Wang,Yi Ding,Qingyao Wu,Wen, Zhengchang,Huang, Haitian,Su, Hanjing,Ding, Yi
DOI: https://doi.org/10.1007/s11761-024-00384-0
2024-03-27
Service Oriented Computing and Applications
Abstract:Automatic speech recognition (ASR) is an important technology in many fields like video-sharing services, online education and live broadcast. Most recent ASR methods are based on deep learning technology. A dataset containing training samples of standard Mandarin and its sub-dialects can be used to train a neural network-based ASR model that can recognize standard Mandarin and its sub-dialects. Usually, due to different costs of collecting different sub-dialects, the number of training samples of standard Mandarin in the dataset is much larger than the number of training samples of sub-dialects, resulting in the recognition performance of the model for standard Mandarin being much higher than that of sub-dialects. In this paper, to enhance the recognition performance for sub-dialects, we propose to reweight the recognition loss for different sub-dialects based on their similarity to standard Mandarin. The proposed reweighting method makes the model pay more attention to sub-dialects with larger loss weights, alleviating the problem of poor recognition performance for sub-dialects. Our model was trained and validated on an open-source dataset named KeSpeech, including standard Mandarin and its eight sub-dialects. Experimental results show that the proposed model is better at recognizing most sub-dialects than the baseline and is about 0.5 lower than the baseline in Character Error Rate.