Abstract:To perform a sign properly and accurately in Iranian Sign Language, the lips must move dynamically in addition to the fingers and hands moving. The current study aims to develop an Automatic Lip-Reading (ALR) system for some Persian words using Deep Neural Networks and implement it on the Apo social robot. We have suggested two ALR systems to achieve this goal. Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) units were utilized in one network, and CNN and Transformer networks (instead of LSTM) were used in the other. In order to determine the accuracy of the proposed networks with Persian words, we also recorded/gathered a Persian language dataset in which 50 individuals repeated each of the 25 selected words/phrases four times. The CNN-LSTM network and the Transformer network had accuracy rates of 94.4% and 96.2% for this dataset, respectively. The second network's results demonstrated that it is entirely appropriate and acceptable for the research's ultimate objective—implementation on the Apo social robot. The practical test results for five participants after implementing the proposed Transformer network on the robot were 80.6%, which is fairly promising in real situations. This study advanced us one step further in reaching our ultimate goal of providing reciprocal human–robot interaction platforms via ISL. We also trained/used the proposed networks' architectures to recognize the utterances in the OuluVS2 database (which is an English database), which allowed us to assess how well such structures worked and to make rough comparisons with other studies in the literature. For this database, the accuracy rates of the CNN-LSTM network and the Transformer network were 91.39% and 92.22%, respectively. Our suggested networks were not the most accurate for the OuluVS2 database (which is around 95%, according to the literature), but they were quite near the top ones. Furthermore, compared to some more complex and even pre-trained networks, our non-complex structured networks were able to provide acceptable results.

Language recognition by convolutional neural networks

Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition

Speech Recognition using Convolution Deep Neural Networks

CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION

Persian Signature Verification using Fully Convolutional Networks

The Recognition Of Persian Phonemes Using PPNet

Automatic Lip Reading of Persian Words by a Robotic System Using Deep Learning Algorithms

Convolutional Neural Network Based Real Time Arabic Speech Recognition to Arabic Braille for Hearing and Visually Impaired

Improving Feature Extraction Using a Hybrid of CNN and LSTM for Entity Identification

Convolutional neural network based language identification system: A spectrogram based approach

Deep neural network architectures for dysarthric speech analysis and recognition

Convolutional Neural Networks for Sentiment Analysis in Persian Social Media

Handwritten Farsi Character Recognition using Artificial Neural Network

Efficient Arabic emotion recognition using deep neural networks

Dialectal Arabic Speech Recognition using CNN-LSTM Based on End-to-End Deep Learning

Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition

Hypertuned Deep Convolutional Neural Network for Sign Language Recognition

Facial expression recognition using HOG and LBP features with convolutional neural network

Voice Pathology Detection and Classification Using Convolutional Neural Network Model

Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN