Language recognition by convolutional neural networks

l. Khosravani Pour,A. Farrokhi
DOI: https://doi.org/10.24200/sci.2022.59110.6064
2024-01-21
Scientia Iranica
Abstract:Speech recognition and in other word communication between computers and human as a sub field of computational linguistics or Natural Language Processing (NLP) has a long history. ASR (Automatic Speech Recognition), TTS (Text to Speech), STT (Speech to Text), CSR (continuous speech recognition), IVR (Interactive Voice Response) systems are different approaches to solve problems in this area. Hybrid deep neural network (DNN) - hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional GMM-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that extracting prosodic features for Persian language (Farsi) can be obtained by using CNNs for segmentation and labeling speech for short texts. By using 128 and 200 filters for CNN and special architecture we reach 19.46 error in detection rate and also better time consumption in comparison with RNNs. One other advantages of using CNN is simplification of learning procedure. Experimental results show that CNN networks can be a good feature extractor for speech recognition in Farsi or other languages.
engineering, multidisciplinary
What problem does this paper attempt to address?