Research on acoustic Model of Putian Dialect Speech Recognition Based on Deep Learning

Huilin Fang,Zhiguo Zhou
DOI: https://doi.org/10.1145/3665348.3665366
2024-05-10
Abstract:The Putian dialect is an ancient language with a rich historical and cultural background. Currently, there is a lack of published research on automatic speech recognition (ASR) technology for the Putian dialect due to various reasons. One notable example is the absence of an audio corpus specifically for this dialect. Moreover, the pronunciation of Putian dialect significantly differs from that of Mandarin used in our daily lives. This paper aims to address these challenges by collecting 8000 isolated words from a phonetic corpus dedicated to the Putian dialect. Based on this self-built database, I attempt to achieve speech recognition for the Putian dialect using various models such as single phoneme model recognition, three-phoneme model, linear discriminant analysis, maximum likelihood change, speaker adaptive training, quick model and DNN-HMM model. Notably, the DNN-HMM-2 model yields the best results with a word error rate of 20.56%. In this experiment, I introduce an additional layer of mapping in order to construct a two-step mapping method for phoneme - Putian dialect - Mandarin alignment. This approach improves training alignment accuracy and reduces character error rate by approximately 3% compared to using only one-level mapping.
Computer Science,Linguistics
What problem does this paper attempt to address?