Abstract:Conventional Hidden Markov Model (HMM) based Automatic Speech Recognition (ASR) systems generally utilize cepstral features as acoustic observation and phonemes as basic linguistic units. Some of the most powerful features currently used in ASR systems are Mel-Frequency Cepstral Coefficients (MFCCs). Speech recognition is inherently complicated due to the variability in the speech signal which includes within- and across-speaker variability. This leads to several kinds of mismatch between acoustic features and acoustic models and hence degrades the system performance. The sensitivity of MFCCs to speech signal variability motivates many researchers to investigate the use of a new set of speech feature parameters in order to make the acoustic models more robust to this variability and thus improve the system performance. The combination of diverse acoustic feature sets has great potential to enhance the performance of ASR systems. This paper is a part of ongoing research efforts aspiring to build an accurate Arabic ASR system for teaching and learning purposes. It addresses the integration of complementary features into standard HMMs for the purpose to make them more robust and thus improve their recognition accuracies. The complementary features which have been investigated in this work are voiced formants and Pitch in combination with conventional MFCC features. A series of experimentations under various combination strategies were performed to determine which of these integrated features can significantly improve systems performance. The Cambridge HTK tools were used as a development environment of the system and experimental results showed that the error rate was successfully decreased, the achieved results seem very promising, even without using language models.

Embedded Learning Segmentation Approach for Arabic Speech Recognition

End-to-End Speech Recognition For Arabic Dialects

Arabic Language Learning Assisted by Computer, based on Automatic Speech Recognition

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Turn Segmentation into Utterances for Arabic Spontaneous Dialogues and Instance Messages

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

Dialectal Arabic Speech Recognition using CNN-LSTM Based on End-to-End Deep Learning

Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning

Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach

Introduction to Arabic Speech Recognition Using CMUSphinx System

Arabic Speech Recognition: Advancement and Challenges

An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Hybrid approaches for automatic vowelization of Arabic texts

An Expert System for Automatic Reading of A Text Written in Standard Arabic

Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models

Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation

A Hybrid Deep Learning Model for Arabic Text Recognition

Deep Learning-Based Detection of Articulatory Features in Arabic and English Speech

Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model

Designing a System to Recognize Main Arabic Dialects