Comparison of DenseNet201 and ResNet50 for lip reading of decimal Digits

Kwakib Saadun Naif1,Kadhim Mahdi Hashim2
DOI: https://doi.org/10.32792/jeps.v12i2.198
2023-02-14
Abstract:Lip reading is a technology supportive of humanity. It a process that interprets the movement of the lipsto understand speech by means of visual interpretation. Where understanding speech is difficult for somegroups of people, especially the hearing impaired or people who are in noisy environments such as theairport or factories lip reading is the alternative source for understanding what people are saying.In the proposal the work starts with inserting the video into the Viola Jones algorithm and taking asequential frame of the face image, then face detection, mouth detection and ROI cropping, then insertingthe mouth frame into a convolutional neural network (DenseNet201) or ReNet50 neural network wherefeatures are extracted and then the test frames are categorized. In this research, a database consisting of35 videos of seven people (5 males and 2 females) was used to pronounce decimal numbers (0, 1, 2, ...,9). The test results indicate that the accuracy in DenseNet20 network is 90%, and in ResNet50 networkwe got an accuracy of 86%.
What problem does this paper attempt to address?