A Lipreading Model Based on Fine-Grained Global Synergy of Lip Movement

Baosheng Sun,Dongliang Xie,Dawei Luo,Xiaojie Yin
DOI: https://doi.org/10.1109/ictai56018.2022.00130
2022-01-01
Abstract:Lipreading is a type of speech recognition based on visual information. It is instructive to design a lipreading model according to the lip movement law. Algorithms in the field of computer vision cannot fully satisfy the characteristics of lipreading, and direct use does not necessarily improve the performance of lipreading. In this paper, we propose that lipreading has fine-grained global synergy by comparing other computer vision tasks and analyzing lip muscle motion patterns. To address this feature, we propose a tailored model and name it Fine-Grained Global Synergy Lipreading (FGSLip). Our model aims to make features synergistic to improve lipreading performance. We introduce global features to represent the overall characteristics of the lip, and local features to learn coarse-grained and fine-grained correlations between features. Then, diffusion and fusion methods are used to make the local features and global features synergistic. Based on the above, several different feature extraction structures are constructed to demonstrate the fine-grained global synergy of lipreading. To verify the effectiveness of the proposed model, extensive experiments are conducted on the laboratory record dataset ICSLR and the public dataset CMLR, and the experimental results show that the proposed method can effectively improve the accuracy of lipreading.
What problem does this paper attempt to address?