Deep Neural Network Based Chinese Dialect Classification
Miao Wan,Jie Ren,Miao Ma,Zhiqiang Li,Rui Cao,Quanli Gao
DOI: https://doi.org/10.1109/CBD54617.2021.00043
2022-01-01
Abstract:With the recent advance of neural networks in audio speech recognition (ASR), Deep Neural Network Based ASR has been widely used in multiple application scenarios such as smart homes, intelligent customer service, meeting minutes, and real-time translation. However, due to the ethnic variety of China, the pronunciation for the same word is different, which is a big challenge to the speech recognition system, especially for Short Utterances. This paper analyzes three kinds of commonly used speech feature parameters: spectrogram, MFCC (Mel-scale Frequency Cepstral Coefficients), and Fbank (Filter bank) and builds the dialect classification model based on a deep neural network for a dataset of 10 dialects in China. In detail, we study the geographical features of dialects and propose a multi-task model that uses the area of the dialect as an auxiliary task and builds a hard parameter sharing based multi-task learning model. The results show that the performance of this model can achieve up to 79.96%. Furthermore, as the hard parameter sharing model cannot effectively learn the correlation between sub-tasks, we then propose a sparse parameter sharing based multitask learning model. The model uses joint training to automatically learn the correlation between sub-tasks, prune redundant networks, and share network parameters. The experiment results show that the sparse parameter sharing for the multi-task classification model achieves the best accuracy, with an average of 83.59%.