The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes

Zhiyong Wu,Jiying Wu,Helen M. Meng
DOI: https://doi.org/10.1109/chinsl.2008.ecp.104
2008-01-01
Abstract:This paper describes an approach for lip tracking using dynamic deformable templates. The objective is to track lip parameters from an audio-visual corpus recording a voice talent who is reading text prompts in a natural and expressive way. The corpus presents challenges to the conventional method of lip tracking with deformable templates. This is because natural and expressive speech includes relatively large motions of the head and the lips. The head motions lead to changes in the illumination of the face region and changes in the observed lip shape. In addition, emphatic pronunciations lead to large changes in the lip shape. Video frames that are affected by face illumination changes present additional difficulty in locating the mouth region (i.e. region of interest, ROI). Video frames that are affected by changes in lip shapes present additional deviations from the lip templates and hence lower tracking accuracies. Our proposed method incorporates "dynamicity" in the deformable templates to render them adaptive to changes in head pose, face illumination and lip shapes. Experiments show that dynamic deformable templates consistently outperform the conventional deformable templates in lip tracking.
What problem does this paper attempt to address?