Importance-Aware Information Bottleneck Learning Paradigm for Lip Reading

Changchong Sheng,Li Liu,Wanxia Deng,Liang Bai,Zhong Liu,Songyang Lao,Gangyao Kuang,Matti Pietikainen
DOI: https://doi.org/10.1109/tmm.2022.3210761
IF: 7.3
2022-01-01
IEEE Transactions on Multimedia
Abstract:Lip reading is the task of decoding text from speakers' mouth movements. Numerous deep learning-based methods have been proposed to address this task. However, these existing deep lip reading models suffer from poor generalization due to overfitting the training data. To resolve this issue, we present a novel learning paradigm that aims to improve the interpretability and generalization of lip reading models. In specific, a Variational Temporal Mask (VTM) module is customized to automatically analyze the importance of frame-level features. Furthermore, the prediction consistency constraints of global information and local temporal important features are introduced to strengthen the model generalization. We evaluate the novel learning paradigm with multiple lip reading baseline models on the LRW and LRW-1000 datasets. Experiments show that the proposed framework significantly improves the generalization performance and interpretability of lip reading models.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?