Acoustic-Based Lip Reading for Mobile Devices: Dataset, Benchmark and a Self Distillation-Based Approach
Yafeng Yin,Zheng Wang,Kang Xia,Lei Xie,Sanglu Lu
DOI: https://doi.org/10.1109/tmc.2023.3294416
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:Speech is a natural communication way between people and a good way for human-computer interaction. However, speech with audible voices often faces the following problems, e.g., being affected by surrounding noises, breaking the quiet environment, leaking privacy, etc. Therefore, silent speech was proposed, especially lip reading, which aims to recognize speech content based on lip movements. In this paper, we utilize inaudible acoustic signals generated from mobile device to sense and recognize lip movements for lip reading. Considering the lack of public dataset in acoustic-based lip reading, we propose and release a large-scale lip-reading dataset ${\sf LIPCMD}$ with 30000 acoustic-based recordings. To advance the further research in lip reading, we provide benchmark evaluation on ${\sf LIPCMD}$ , while using traditional machine learning solutions and recent deep learning approaches. To recognize weak acoustic signals as words for lip reading, we propose a self distillation based approach LipReader , which distills the probability distribution and attention map in convolutional neural network itself for better classification. Finally, we implement LipReader on smartphone and evaluate it on ${\sf LIPCMD}$ dataset as well as under complex scenarios. Experimental results show that LipReader can achieve a good recognition accuracy for lip reading, i.e., 91.58%, while outperforming baseline solutions and existing work.