Cantonese sentence dataset for lip‐reading
Yewei Xiao,Xuanming Liu,Lianwei Teng,Aosu Zhu,Picheng Tian,Jian Huang
DOI: https://doi.org/10.1049/ipr2.13123
IF: 2.3
2024-06-20
IET Image Processing
Abstract:Lip‐reading deciphers speech without audio data, and deep learning advancements have improved lip‐reading in English and Chinese. Cantonese lip‐reading sentences, a Cantonese lip‐reading dataset, and a novel visual frontend, 3D‐visual attention net, which achieves comparable performance on Chinese Mandarin lip reading dataset, lip reading sentences 2, lip reading sentences 3, and Cantonese lip‐reading sentences datasets, are introduced. This addresses the scarcity of Cantonese research and provides a new foundation for dialect lip‐reading, fostering the advancement of Cantonese lip‐reading tasks. Lip‐reading deciphers speech by observing lip movements without relying on audio data. The rapid advancements in deep learning have significantly improved lip‐reading for both English and Chinese; however, research on dialects such as Cantonese remains scarce. Consequently, most Chinese lip‐reading datasets focus on Mandarin, with only a few addressing Cantonese. To bridge this gap, a sentence‐level Cantonese lip‐reading dataset, designated as Cantonese lip‐reading sentences are introduced, comprising over 500 unique speakers and more than 30,000 samples. To ensure alignment with real‐world scenarios, no restrictions are imposed on factors such as gender, age, posture, lighting conditions, or speech rate. A comprehensive description of the pipeline employed is provided for collecting and constructing the dataset and introduce an innovative visual frontend, 3D‐visual attention net. This frontend combines the advantages of convolution and self‐attention mechanisms to extract fine‐grained lip region features. These features are subsequently input into the conformer backend for temporal sequence modelling, achieving comparable performance on Chinese Mandarin lip reading dataset, lip reading sentences 2, lip reading sentences 3, and Cantonese lip‐reading sentences datasets. Benchmark tests on Cantonese lip‐reading sentences demonstrate the challenges it poses, providing a novel research foundation for dialect lip‐reading and fostering the advancement of Cantonese lip‐reading tasks.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology