A Compact Representation of Visual Speech Data Using Latent Variables.

Ziheng Zhou,Xiaopeng Hong,Guoying Zhao,Matti Pietikainen
DOI: https://doi.org/10.1109/tpami.2013.173
IF: 23.6
2014-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the interspeaker variations of visual appearances and those caused by uttering within images, and incorporates the structural information of the visual data through placing priors of the latent variables along a curve embedded within a path graph.
What problem does this paper attempt to address?