Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Hossam Boulal,Mohamed Hamidi,Mustapha Abarkan,Jamal Barkani
DOI: https://doi.org/10.1007/s10772-024-10100-0
2024-04-16
International Journal of Speech Technology
Abstract:The field of speech recognition makes it simpler for humans and machines to engage with speech. Number-oriented communication, such as using a registration code, mobile number, score, or account number, can benefit from speech recognition for digits. This paper presents our Amazigh automatic speech recognition (ASR) experience based on the deep learning approach. The convolutional neural network (CNN) and Mel spectrogram are exploited to evaluate audio samples and produce spectrograms as a part of the deep learning strategy. To attempt the recognition of the Amazigh numerals, we use a database that includes digits ranging from zero to nine collected from 42 native speakers in total, men and women between the ages of 20 and 40. Our experimental results show that spoken digits in Amazigh can be identified with a maximum accuracy of 93.62%, 94% Precision, and 94% Recall.
What problem does this paper attempt to address?