Automatic Lip Reading of Persian Words by a Robotic System Using Deep Learning Algorithms
Amir Gholipour,Hoda Mohammadzade,Ali Ghadami,Alireza Taheri
DOI: https://doi.org/10.1007/s40998-024-00756-4
2024-10-02
Iranian Journal of Science and Technology Transactions of Electrical Engineering
Abstract:To perform a sign properly and accurately in Iranian Sign Language, the lips must move dynamically in addition to the fingers and hands moving. The current study aims to develop an Automatic Lip-Reading (ALR) system for some Persian words using Deep Neural Networks and implement it on the Apo social robot. We have suggested two ALR systems to achieve this goal. Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) units were utilized in one network, and CNN and Transformer networks (instead of LSTM) were used in the other. In order to determine the accuracy of the proposed networks with Persian words, we also recorded/gathered a Persian language dataset in which 50 individuals repeated each of the 25 selected words/phrases four times. The CNN-LSTM network and the Transformer network had accuracy rates of 94.4% and 96.2% for this dataset, respectively. The second network's results demonstrated that it is entirely appropriate and acceptable for the research's ultimate objective—implementation on the Apo social robot. The practical test results for five participants after implementing the proposed Transformer network on the robot were 80.6%, which is fairly promising in real situations. This study advanced us one step further in reaching our ultimate goal of providing reciprocal human–robot interaction platforms via ISL. We also trained/used the proposed networks' architectures to recognize the utterances in the OuluVS2 database (which is an English database), which allowed us to assess how well such structures worked and to make rough comparisons with other studies in the literature. For this database, the accuracy rates of the CNN-LSTM network and the Transformer network were 91.39% and 92.22%, respectively. Our suggested networks were not the most accurate for the OuluVS2 database (which is around 95%, according to the literature), but they were quite near the top ones. Furthermore, compared to some more complex and even pre-trained networks, our non-complex structured networks were able to provide acceptable results.
engineering, electrical & electronic