ASKCC-DCNN-CTC: A Multi-Core Two Dimensional Causal Convolution Fusion Network with Attention Mechanism for End-to-End Speech Recognition

Rongchuang Lv,Niansheng Chen,Songlin Cheng,Guangyu Fan,Lei Rao,Xiaoyong Song,Dingyu Yang
DOI: https://doi.org/10.1109/cscwd57460.2023.10151993
2023-01-01
Abstract:Aiming at the problems of difficulty in extracting key features and low prediction accuracy of traditional convolutional neural networks in Chinese speech recognition, we analyze the impacts of information leakage and unstandardized phoneme features on its performance, based on the deep convolutional neural network (DCNN)-connectionist temporal classification (CTC) model. In addition, a multi-core two dimensional causal convolution fusion network layer structure of SKNet is constructed, and we propose a DCNN-CTC model for fusion of attention mechanism and SKNet multi-core 2D causal convolution network (ASKCC-DCNN-CTC), which effectively improves the accuracy and training speed of Chinese speech recognition. The simulation results show that the error rate of our model on the ST-CMDS dataset is 12.201% lower than that of the DCNN-CTC model, the performance on the THCHS30 dataset is also improved, which reveals a good generalization ability.
What problem does this paper attempt to address?